diff --git a/en_US.ISO8859-1/articles/committers-guide/article.sgml b/en_US.ISO8859-1/articles/committers-guide/article.sgml index 2722e67321..571052f549 100644 --- a/en_US.ISO8859-1/articles/committers-guide/article.sgml +++ b/en_US.ISO8859-1/articles/committers-guide/article.sgml @@ -1,2622 +1,2622 @@ %man; %freebsd; %authors; %teams; %mailing-lists; ]>
Committer Guide The FreeBSD Documentation Project $FreeBSD$ 1999 2000 2001 The FreeBSD Documentation Project This document provides information for the FreeBSD committer community. All new committers should read this document before they start, and existing committers are strongly encouraged to review it from time to time. Administrative Details Main Repository Host freefall.FreeBSD.org Login Methods &man.ssh.1; Main CVSROOT /home/ncvs Main &a.cvs; &a.peter; and &a.markm;, as well as &a.joe; for ports/ Mailing Lists &a.developers;, &a.committers; Noteworthy CVS Tags - RELENG_4 (4.x-STABLE), HEAD (-CURRENT) + RELENG_4 (4.X-STABLE), HEAD (-CURRENT) It is required that you use &man.ssh.1; or &man.telnet.1; with Kerberos 5 to connect to the repository hosts. These are generally more secure than plain &man.telnet.1; or &man.rlogin.1; since credential negotiation will always be encrypted. All traffic is encrypted by default with &man.ssh.1;. With utilities like &man.ssh-agent.1; and &man.scp.1; also available, &man.ssh.1; is also far more convenient. If you do not know anything about &man.ssh.1;, please see . Commit Bit Types The FreeBSD CVS repository has a number of components which, when combined, support the basic operating system source, documentation, third party application ports infrastructure, and various maintained utilities. When FreeBSD commit bits are allocated, the areas of the tree where the bit may be used are specified. Generally, the areas associated with a bit reflect who authorized the allocation of the commit bit. Additional areas of authority may be added at a later date: when this occurs, the committer should follow normal commit bit allocation procedures for that area of the tree, seeking approval from the appropriate entity and possibly getting a mentor for that area for some period of time. Committer Type Responsible Tree Components src core@ src/, doc/ subject to appropriate review doc nik@ doc/, src/ documentation ports portmgr@ ports/ Commit bits allocated prior to the development of the notion of areas of authority may be appropriate for use in many parts of the tree. However, common sense dictates that a committer who has not previously worked in an area of the tree seek review prior to committing, seek approval from the appropriate responsible party, and/or work with a mentor. Since the rules regarding code maintenance differ by area of the tree, this is as much for the benefit of the committer working in an area of less familiarity as it is for others working on the tree. Committers are encouraged to seek review for their work as part of the normal development process, regardless of the area of the tree where the work is occurring. CVS Operations It is assumed that you are already familiar with the basic operation of CVS. The &a.cvs; are the owners of the CVS repository and are responsible for any and all direct modification of it for the purposes of cleanup or fixing some grievous abuse of CVS by a committer. No one else should attempt to touch the repository directly. Should you cause some repository accident, say a bad cvs import or cvs tag operation, do not attempt to fix it yourself! Mail the &a.cvs; (or call one of them) and report the problem to one of them instead. The only ones allowed to directly fiddle the repository bits are the repomeisters. CVS operations are usually done by logging into freefall, making sure the CVSROOT environment variable is set to /home/ncvs, and then doing the appropriate check-out/check-in operations. If you wish to add something which is wholly new (like contrib-ified sources, etc), cvs import should be used. Refer to the &man.cvs.1; manual page for usage. Note that when you use CVS on freefall, you should set your umask to 2, as well as setting the CVSUMASK environment variable to 2. This ensures that any new files created by cvs add will have the correct permissions. If you add a file or directory and discover that the file in the repository has incorrect permissions (specifically, all files in the repository should be group writable by group ncvs), contact one of the repository meisters as described below. If you are familiar with remote CVS and consider yourself pretty studly with CVS in general, you can also do CVS operations directly from your own machine and local working sources. Just remember to set CVS_RSH to ssh so that you are using a relatively secure and reliable transport. If you have no idea what any of the above even means, on the other hand, then please stick with logging into freefall and applying your diffs with &man.patch.1;. If you need to use CVS add and delete operations in a manner that is effectively a &man.mv.1; operation, then a repository copy is in order rather than using CVS add and delete. In a repository copy, a CVS Meister will copy the file(s) to their new name and/or location and let you know when it is done. The purpose of a repository copy is to preserve file change history, or logs. We in the FreeBSD Project greatly value the change history that CVS gives to the project. CVS reference information, tutorials, and FAQs can also be found at: http://www.cvshome.org/docs/, and the information in Karl Fogel's chapters from Open Source Development with CVS are also very useful. &a.des; also supplied the following mini primer for CVS. Check out a module with the co or checkout command. &prompt.user; cvs checkout shazam This checks out a copy of the shazam module. If there is no shazam module in the modules file, it looks for a top-level directory named shazam instead. Useful <command>cvs checkout</command> options Do not create empty directories Check out a single level, no subdirectories Check out revision, branch or tag rev Check out the sources as they were on date date
Practical FreeBSD examples: Check out the miscfs module, which corresponds to src/sys/miscfs: &prompt.user; cvs co miscfs You now have a directory named miscfs with subdirectories CVS, deadfs, devfs, and so on. One of these (linprocfs) is empty. Check out the same files, but with full path: &prompt.user; cvs co src/sys/miscfs You now have a directory named src, with subdirectories CVS and sys. src/sys has subdirectories CVS and miscfs, etc. Check out the same files, but prunes empty directories: &prompt.user; cvs co -P miscfs You now have a directory named miscfs with subdirectories CVS, deadfs, devfs... but note that there is no linprocfs subdirectory, because there are no files in it. Check out the directory miscfs, but none of the subdirectories: &prompt.user; cvs co -l miscfs You now have a directory named miscfs with just one subdirectory named CVS. Check out the miscfs module as - it is in the 4.x branch: + it is in the 4.X branch: &prompt.user; cvs co -rRELENG_4 miscfs You can modify the sources and commit along this branch. Check out the miscfs module as it was in 3.4-RELEASE. &prompt.user; cvs co -rRELENG_3_4_0_RELEASE miscfs You will not be able to commit modifications, since RELENG_3_4_0_RELEASE is a point in time, not a branch. Check out the miscfs module as it was on Jan 15 2000. &prompt.user; cvs co -D'01/15/2000' miscfs You will not be able to commit modifications. Check out the miscfs module as it was one week ago. &prompt.user; cvs co -D'last week' miscfs You will not be able to commit modifications. Note that cvs stores metadata in subdirectories named CVS. Arguments to and are sticky, which means cvs will remember them later, e.g. when you do a cvs update.
Check the status of checked-out files with the status command. &prompt.user; cvs status shazam This displays the status of the shazam file or of every file in the shazam directory. For every file, the status is given as one of: Up-to-date File is up-to-date and unmodified. Needs Patch File is unmodified, but there is a newer revision in the repository. Locally Modified File is up-to-date, but modified. Needs Merge File is modified, and there is a newer revision in the repository. File had conflicts on merge There were conflicts the last time this file was updated, and they have not been resolved yet. You will also see the local revision and date, the revision number of the newest applicable version (newest applicable because if you have a sticky date, tag or branch, it may not be the actual newest revision), and any sticky tags, dates or options. Once you have checked something out, update it with the update command. &prompt.user; cvs update shazam This updates the shazam file or the contents of the shazam directory to the latest version along the branch you checked out. If you checked out a point in time, does nothing unless the tags have moved in the repository or some other weird stuff is going on. Useful options, in addition to those listed above for checkout: Check out any additional missing directories. Update to head of main branch. More magic (see below). If you checked out a module with or , running cvs update with a different or argument or with will select a new branch, revision or date. The option clears all sticky tags, dates or revisions whereas and set new ones. Theoretically, specifying HEAD as argument to will give you the same result as , but that is just theory. The option is useful if: somebody has added subdirectories to the module you have checked out after you checked it out. you checked out with , and later change your mind and want to check out the subdirectories as well. you deleted some subdirectories and want to check them all back out. Watch the output of the cvs update with care. The letter in front of each filename indicates what was done with it: U The file was updated without trouble. P The file was updated without trouble (you will only see this when working against a remote repo). M The file had been modified, and was merged without conflicts. C The file had been modified, and was merged with conflicts. Merging is what happens if you check out a copy of some source code, modify it, then someone else commits a change, and you run cvs update. CVS notices that you have made local changes, and tries to merge your changes with the changes between the version you originally checked out and the one you updated to. If the changes are to separate portions of the file, it will almost always work fine (though the result might not be syntactically or semantically correct). CVS will print an M in front of every locally modified file even if there is no newer version in the repository, so cvs update is handy for getting a summary of what you have changed locally. If you get a C, then your changes conflicted with the changes in the repository (the changes were to the same lines, or neighboring lines, or you changed the local file so much that cvs can not figure out how to apply the repository's changes). You will have to go through the file manually and resolve the conflicts; they will be marked with rows of <, = and > signs. For every conflict, there will be a marker line with seven < signs and the name of the file, followed by a chunk of what your local file contained, followed by a separator line with seven = signs, followed by the corresponding chunk in the repository version, followed by a marker line with seven > signs and the revision number you updated to. The option is slightly voodoo. It updates the local file to the specified revision as if you used , but it does not change the recorded revision number or branch of the local file. It is not really useful except when used twice, in which case it will merge the changes between the two specified versions into the working copy. For instance, say you commit a change to shazam/shazam.c in &os.current; and later want to MFC it. The change you want to MFC was revision 1.15: Check out the &os.stable; version of the shazam module: &prompt.user; cvs co -rRELENG_4 shazam Apply the changes between rev 1.14 and 1.15: &prompt.user; cvs update -j1.14 -j1.15 shazam/shazam.c You will almost certainly get a conflict because - of the $Id: article.sgml,v 1.128 2002-07-02 00:04:18 trhodes Exp $ (or in FreeBSD's case, + of the $Id: article.sgml,v 1.129 2002-07-03 23:19:04 jim Exp $ (or in FreeBSD's case, $FreeBSD$) lines, so you will have to edit the file to resolve the conflict (remove the marker lines and - the second $Id: article.sgml,v 1.128 2002-07-02 00:04:18 trhodes Exp $ line, leaving the original - $Id: article.sgml,v 1.128 2002-07-02 00:04:18 trhodes Exp $ line intact). + the second $Id: article.sgml,v 1.129 2002-07-03 23:19:04 jim Exp $ line, leaving the original + $Id: article.sgml,v 1.129 2002-07-03 23:19:04 jim Exp $ line intact). View differences between the local version and the repository version with the diff command. &prompt.user; cvs diff shazam shows you every modification you have made to the shazam file or module. Useful <command>cvs diff</command> options Uses the unified diff format. Uses the context diff format. Shows missing or added files.
You always want to use , since unified diffs are much easier to read than almost any other diff format (in some circumstances, context diffs generated with the option may be better, but they are much bulkier). A unified diff consists of a series of hunks. Each hunk begins with a line that starts with two @ signs and specifies where in the file the differences are and how many lines they span. This is followed by a number of lines; some (preceded by a blank) are context; some (preceded by a - sign) are outtakes and some (preceded by a +) are additions. You can also diff against a different version than the one you checked out by specifying a version with or as in checkout or update, or even view the diffs between two arbitrary versions (without regard for what you have locally) by specifying two versions with or .
View log entries with the log command. &prompt.user; cvs log shazam If shazam is a file, this will print a header with information about this file, such as where in the repository this file is stored, which revision is the HEAD for this file, what branches this file is in, and any tags that are valid for this file. Then, for each revision of this file, a log message is printed. This includes the date and time of the commit, who did the commit, how many lines were added and/or deleted, and finally the log message that the committer who did the change wrote. If shazam is a directory, then the log information described above is printed for each file in the directory in turn. Unless you give the to log, the log for all subdirectories of shazam is printed too, in a recursive manner. Use the log command to view the history of one or more files, as it is stored in the CVS repository. You can even use it to view the log message of a specific revision, if you add the to the log command: &prompt.user; cvs log -r1.2 shazam This will print only the log message for revision 1.2 of file shazam if it is a file, or the log message for revision 1.2 of each file under shazam if it is a directory. See who did what with the annotate command. This command shows you each line of the specified file or files, along with which user most recently changed that line. &prompt.user; cvs annotate shazam Add new files with the add command. Create the file, cvs add it, then cvs commit it. Similarly, you can add new directories by creating them and then cvs adding them. Note that you do not need to commit directories. Remove obsolete files with the remove command. Remove the file, then cvs rm it, then cvs commit it. Commit with the commit or checkin command. Useful <command>cvs commit</command> options Force a commit of an unmodified file. Specify a commit message on the command line rather than invoking an editor.
Use the option if you realize that you left out important information from the commit message. Good commit messages are important. They tell others why you did the changes you did, not just right here and now, but months or years from now when someone wonders why some seemingly illogical or inefficient piece of code snuck into your source file. It is also an invaluable aid to deciding which changes to MFC and which not to MFC. Commit messages should be clear, concise and provide a reasonable summary to give an indication of what was changed and why. Commit messages should provide enough information to enable a third party to decide if the change is relevant to them and if they need to read the change itself. Avoid committing several unrelated changes in one go. It makes merging difficult, and also makes it harder to determine which change is the culprit if a bug crops up. Avoid committing style or whitespace fixes and functionality fixes in one go. It makes merging difficult, and also makes it harder to understand just what functional changes were made. In the case of documentation files, it can make the job of the translation teams more complicated, as it becomes difficult for them to determine exactly what content changes need to be translated. Avoid committing changes to multiple files in one go with a generic, vague message. Instead, commit each file (or small, related groups of files) with tailored commit messages. Before committing, always: verify which branch you are committing to, using cvs status. review your diffs, using cvs diff Also, ALWAYS specify which files to commit explicitly on the command line, so you do not accidentally commit other files than the ones you intended - cvs commit without any arguments will commit every modification in your current working directory and every subdirectory.
Additional tips and tricks: You can place commonly used options in your ~/.cvsrc, like this: cvs -z3 diff -Nu update -Pd checkout -P This example says: always use compression level 3 when talking to a remote server. This is a life-saver when working over a slow connection. always use the (show added or removed files) and (unified diff format) options to &man.diff.1;. always use the (prune empty directories) and (check out new directories) options when updating. always use the (prune empty directories) option when checking out. Use Eivind Eklund's cdiff script to view unidiffs. It is a wrapper for &man.less.1; that adds ANSI color codes to make hunk headers, outtakes and additions stand out; context and garbage are unmodified. It also expands tabs properly (tabs often look wrong in diffs because of the extra character in front of each line). http://people.FreeBSD.org/~eivind/cdiff Simply use it instead of &man.more.1; or &man.less.1;: &prompt.user; cvs diff -Nu shazam | cdiff Alternatively some editors like &man.vim.1; (editors/vim5) have color support and when used as a pager with color syntax highlighting switched on will highlight many types of file, including diffs, patches, and cvs/rcs logs. &prompt.user; echo "syn on" >> ~/.vimrc &prompt.user; cvs diff -Nu shazam | vim - &prompt.user; cvs log shazam | vim - CVS is old, arcane, crufty and buggy, and sometimes exhibits non-deterministic behavior which some claim as proof that it is actually merely the Newtonian manifestation of a sentient transdimensional entity. It is not humanly possible to know its every quirk inside out, so do not be afraid to ask the resident AI (&a.cvs;) for help. Do not leave the cvs commit command in commit message editing mode for too long (more than 2–3 minutes). It locks the directory you are working with and will prevent other developers from committing into the same directory. If you have to type a long commit message, type it before executing cvs commit, and insert it into the commit message.
Conventions and Traditions As a new committer there are a number of things you should do first. Add yourself to the Developers section of the Contributors List and remove yourself from the Additional Contributors section. This is a relatively easy task, but remains a good first test of your CVS skills. Add an entry for yourself to www/en/news/news.xml. Look for the other entries that look like A new committer and follow the format. If you have a PGP or GnuPG key, you may want to add it to doc/en_US.ISO8859-1/books/handbook/pgpkeys. &a.des; has written a shell script to make this extremely simple. See the README file for more information. Some people add an entry for themselves to ports/astro/xearth/files/freebsd.committers.markers. Some people add an entry for themselves to src/usr.bin/calendar/calendars/calendar.freebsd. Introduce yourself to the other committers, otherwise no one will have any idea who you are or what you are working on. You do not have to write a comprehensive biography, just write a paragraph or two about who you are and what you plan to be working on as a committer in FreeBSD. Email this to the &a.developers; and you will be on your way! Log into hub.FreeBSD.org and create a /var/forward/user (where user is your username) file containing the e-mail address where you want mail addressed to yourusername@FreeBSD.org to be forwarded. This includes all of the commit messages as well as any other mail addressed to the &a.committers; and the &a.developers;. Really large mailboxes which have taken up permanent residence on hub often get accidentally truncated without warning, so forward it or read it and you will not lose it. If you are subscribed to the &a.cvsall;, you will probably want to unsubscribe to avoid receiving duplicate copies of commit messages and their followups. All new committers also have a mentor assigned to them for the first few months. Your mentor is more or less responsible for explaining anything which is confusing to you and is also responsible for your actions during this initial period. If you make a bogus commit, it is only going to embarrass your mentor and you should probably make it a policy to pass at least your first few commits by your mentor before committing it to the repository. All commits should go to &os.current; first before being merged to &os.stable;. No major new features or high-risk modifications should be made to the &os.stable; branch. Developer Relations If you are working directly on your own code or on code which is already well established as your responsibility, then there is probably little need to check with other committers before jumping in with a commit. If you see a bug in an area of the system which is clearly orphaned (and there are a few such areas, to our shame), the same applies. If, however, you are about to modify something which is clearly being actively maintained by someone else (and it is only by watching the cvs-committers mailing list that you can really get a feel for just what is and is not) then consider sending the change to them instead, just as you would have before becoming a committer. For ports, you should contact the listed MAINTAINER in the Makefile. For other parts of the repository, if you are unsure who the active maintainer might be, it may help to scan the output of cvs log to see who has committed changes in the past. &a.fenner; has written a nice shell script that can help determine who the active maintainer might be. It lists each person who has committed to a given file along with the number of commits each person has made. It can be found on freefall at ~fenner/bin/whodid. If your queries go unanswered or the committer otherwise indicates a lack of proprietary interest in the area affected, go ahead and commit it. If you are unsure about a commit for any reason at all, have it reviewed by -hackers before committing. Better to have it flamed then and there rather than when it is part of the CVS repository. If you do happen to commit something which results in controversy erupting, you may also wish to consider backing the change out again until the matter is settled. Remember – with CVS we can always change it back. GNATS The FreeBSD Project utilizes GNATS for tracking bugs and change requests. Be sure that if you commit a fix or suggestion found in a GNATS PR, you use edit-pr pr-number on freefall to close it. It is also considered nice if you take time to close any PRs associated with your commits, if appropriate. You can also make use of &man.send-pr.1; yourself for proposing any change which you feel should probably be made, pending a more extensive peer-review first. You can find out more about GNATS at: http://www.cs.utah.edu/csinfo/texinfo/gnats/gnats.html http://www.FreeBSD.org/support.html http://www.FreeBSD.org/send-pr.html &man.send-pr.1; You can run a local copy of GNATS, and then integrate the FreeBSD GNATS tree in to it using CVSup. Then you can run GNATS commands locally, or use other interfaces, such as tkgnats. This lets you query the PR database without needing to be connected to the Internet. Using a local GNATS tree If you are not already downloading the GNATS tree, add this line to your supfile, and re-sup. Note that since GNATS is not under CVS control it has no tag, so if you are adding it to your existing supfile it should appear before any tag= entry as these remain active once set. gnats release=current prefix=/usr This will place the FreeBSD GNATS tree in /usr/gnats. You can use a refuse file to control which categories to receive. For example, to only receive docs PRs, put this line in /usr/local/etc/cvsup/sup/refuse The precise path depends on the *default base setting in your supfile. . gnats/[a-ce-z]* The rest of these examples assume you have only supped the docs category. Adjust them as necessary, depending on the categories you are synching. Install the GNATS port from ports/databases/gnats. This will place the various GNATS directories under $PREFIX/share/gnats. Symlink the GNATS directories you are supping under the version of GNATS you have installed. &prompt.root; cd /usr/local/share/gnats/gnats-db &prompt.root; ln -s /usr/gnats/docs Repeat as necessary, depending on how many GNATS categories you are synching. Update the GNATS categories file with these categories. The file is $PREFIX/share/gnats/gnats-db/gnats-adm/categories. # This category is mandatory pending:Category for faulty PRs:gnats-admin: # # FreeBSD categories # docs:Documentation Bug:nik: Run $PREFIX/libexec/gnats/gen-index to recreate the GNATS index. The output has to be redirected to $PREFIX/share/gnats/gnats-db/gnats-adm/index. You can do this periodically from &man.cron.8;, or run &man.cvsup.1; from a shell script that does this as well. &prompt.root; /usr/local/libexec/gnats/gen-index \ > /usr/local/share/gnats/gnats-db/gnats-adm/index Test the configuration by querying the PR database. This command shows open docs PRs. &prompt.root; query-pr -c docs -s open Other interfaces, such as that provided by the databases/tkgnats port should also work nicely. Pick a PR and close it. This procedure only works to allow you to view and query the PRs locally. To edit or close them you will still have to log in to freefall and do it from there. Who's Who Besides the repository meisters, there are other FreeBSD project members and teams whom you will probably get to know in your role as a committer. Briefly, and by no means all-inclusively, these are: &a.jhb; John is the manager of the SMPng Project, and has authority over the architectural design and implementation of the move to fine-grained kernel threading and locking. He's also the editor of the SMPng Architecture Document. If you're working on fine-grained SMP and locking, please coordinate with John. You can learn more about the SMPng Project on its home page: http://www.FreeBSD.org/smp/ &a.jake;, &a.tmm; Jake and Thomas are the maintainers of the sparc64 hardware port. &a.nik; Nik oversees the Documentation Project. As well as writing documentation he put together the infrastructure under doc/share/mk and the stylesheets and related code under doc/share/sgml. If you have questions about these you are encouraged to send them via the &a.doc;. Committers interested in contributing to the documentation should familiarize themselves with the Documentation Project Primer. &a.ru; Ruslan is Mister &man.mdoc.7;. If you are writing a man page and need some advice on the structure, or the markup, ask Ruslan. &a.bde; Bruce is the Style Police-Meister. When you do a commit that could have been done better, Bruce will be there to tell you. Be thankful that someone is. Bruce is also very knowledgeable on the various standards applicable to FreeBSD. &a.gallatin; &a.mjacob; &a.dfr; &a.obrien; These are the primary developers and overseers of the DEC Alpha AXP platform. &a.dg; David is the overseer of the VM system. If you have a VM system change in mind, coordinate it with David. &a.murray; &a.steve; &a.rwatson; &a.jhb; &a.bmah; These are the members of the &a.re;. This team is responsible for setting release deadlines and controlling the release process. During code freezes, the release engineers have final authority on all changes to the system for whichever branch is pending release status. If there is something you want merged from &os.current; to &os.stable; (whatever values those may have at any given time), these are the people to talk to about it. Bruce is also the keeper of the release documentation (src/release/doc/*). If you commit a change that you think is worthy of mention in the release notes, please make sure Bruce knows about it. Better still, send him a patch with your suggested commentary. &a.benno; Benno is the official maintainer of the PowerPC port. &a.brian; Official maintainer of /usr/sbin/ppp. &a.nectar; Jacques is the FreeBSD Security Officer and oversees the &a.security-officer;. &a.wollman; If you need advice on obscure network internals or are not sure of some potential change to the networking subsystem you have in mind, Garrett is someone to talk to. Garrett is also very knowledgeable on the various standards applicable to FreeBSD. &a.committers; cvs-committers is the entity that CVS uses to send you all your commit messages. You should never send email directly to this list. You should only send replies to this list when they are short and are directly related to a commit. &a.developers; developers is all committers. This list was created to be a forum for the committers community issues. Examples are Core voting, announcements, etc. This list is not intended as a place for code reviews or a replacement for the &a.arch; or the &a.audit;. In fact using it as such hurts the FreeBSD Project as it gives a sense of a closed list where general decisions affecting all of the FreeBSD using community are made without being open. Last, but not least never, never ever, email the &a.developers; and CC:/BCC: another FreeBSD list. Never, ever email another FreeBSD email list and CC:/BCC: the &a.developers;. Doing so can greatly diminish the benefits of this list. Also, never publically post or forward emails sent to the &a.developers;. The act of sending to the &a.developers; vs. a public list means the information in the email is not for public consumption. SSH Quick-Start Guide If you are using FreeBSD 4.0 or later, OpenSSH is included in the base system. If you are using an earlier release, update and install one of the SSH ports. In general, you will probably want to get OpenSSH from the security/openssh port. You may also wish to check out the original ssh1 in the security/ssh port, but make certain you pay attention to its license. Note that both of these ports cannot be installed at the same time. If you do not wish to type your password in every time you use &man.ssh.1;, and you use RSA or DSA keys to authenticate, &man.ssh-agent.1; is there for your convenience. If you want to use &man.ssh-agent.1;, make sure that you run it before running other applications. X users, for example, usually do this from their .xsession or .xinitrc file. See &man.ssh-agent.1; for details. Generate a key pair using &man.ssh-keygen.1;. The key pair will wind up in your $HOME/.ssh directory. Send your public key ($HOME/.ssh/identity.pub) to the person setting you up as a committer so it can be put into your authorized_keys file in your home directory on freefall (i.e. $HOME/.ssh/authorized_keys). Now you should be able to use &man.ssh-add.1; for authentication once per session. This will prompt you for your private key's pass phrase, and then store it in your authentication agent (&man.ssh-agent.1;). If you no longer wish to have your key stored in the agent, issuing ssh-add -d will remove it. Test by doing something such as ssh freefall.FreeBSD.org ls /usr. For more information, see security/openssh, &man.ssh.1;, &man.ssh-add.1;, &man.ssh-agent.1;, &man.ssh-keygen.1;, and &man.scp.1;. The FreeBSD Committers' Big List of Rules Respect other committers. Respect other contributors. Discuss any significant change before committing. Respect existing maintainers (if listed in the MAINTAINER field in Makefile or in the MAINTAINER file in the top-level directory). Never touch the repository directly. Ask a Repomeister. Any disputed change must be backed out pending resolution of the dispute if requested by a maintainer. Security related changes may override a maintainer's wishes at the Security Officer's discretion. Changes go to &os.current; before &os.stable; unless specifically permitted by the release engineer or unless they are not applicable to &os.current;. Any non-trivial or non-urgent change which is applicable should also be allowed to sit in &os.current; for at least 3 days before merging so that it can be given sufficient testing. The release engineer has the same authority over the &os.stable; branch as outlined for the maintainer in rule #6. Do not fight in public with other committers; it looks bad. If you must strongly disagree about something, do so only in private. Respect all code freezes and read the committers and developers mailing lists in a timely manner so you know when a code freeze is in effect. When in doubt on any procedure, ask first! Test your changes before committing them. Do not commit to anything under the src/contrib, src/crypto, and src/sys/contrib trees without explicit approval from the respective maintainer(s). As noted, breaking some of these rules can be grounds for suspension or, upon repeated offense, permanent removal of commit privileges. Individual members of core have the power to temporarily suspend commit privileges until core as a whole has the chance to review the issue. In case of an emergency (a committer doing damage to the repository), a temporary suspension may also be done by the repository meisters. Only a 2/3 majority of core has the authority to suspend commit privileges for longer than a week or to remove them permanently. This rule does not exist to set core up as a bunch of cruel dictators who can dispose of committers as casually as empty soda cans, but to give the project a kind of safety fuse. If someone is out of control, it is important to be able to deal with this immediately rather than be paralyzed by debate. In all cases, a committer whose privileges are suspended or revoked is entitled to a hearing by core, the total duration of the suspension being determined at that time. A committer whose privileges are suspended may also request a review of the decision after 30 days and every 30 days thereafter (unless the total suspension period is less than 30 days). A committer whose privileges have been revoked entirely may request a review after a period of 6 months have elapsed. This review policy is strictly informal and, in all cases, core reserves the right to either act on or disregard requests for review if they feel their original decision to be the right one. In all other aspects of project operation, core is a subset of committers and is bound by the same rules. Just because someone is in core does not mean that they have special dispensation to step outside of any of the lines painted here; core's special powers only kick in when it acts as a group, not on an individual basis. As individuals, the core team members are all committers first and core second. Details Respect other committers. This means that you need to treat other committers as the peer-group developers that they are. Despite our occasional attempts to prove the contrary, one does not get to be a committer by being stupid and nothing rankles more than being treated that way by one of your peers. Whether we always feel respect for one another or not (and everyone has off days), we still have to treat other committers with respect at all times or the whole team structure rapidly breaks down. Being able to work together long term is this project's greatest asset, one far more important than any set of changes to the code, and turning arguments about code into issues that affect our long-term ability to work harmoniously together is just not worth the trade-off by any conceivable stretch of the imagination. To comply with this rule, do not send email when you are angry or otherwise behave in a manner which is likely to strike others as needlessly confrontational. First calm down, then think about how to communicate in the most effective fashion for convincing the other person(s) that your side of the argument is correct, do not just blow off some steam so you can feel better in the short term at the cost of a long-term flame war. Not only is this very bad energy economics, but repeated displays of public aggression which impair our ability to work well together will be dealt with severely by the project leadership and may result in suspension or termination of your commit privileges. That is never an option which the project's leadership enjoys in the slightest, but unity comes first. No amount of code or good advice is worth trading that away. Respect other contributors. You were not always a committer. At one time you were a contributor. Remember that at all times. Remember what it was like trying to get help and attention. Do not forget that your work as a contributor was very important to you. Remember what it was like. Do not discourage, belittle, or demean contributors. Treat them with respect. They are our committers in waiting. They are every bit as important to the project as committers. Their contributions are as valid and as important as your own. After all, you made many contributions before you became a committer. Always remember that. Consider the points raised under and apply them also to contributors. Discuss any significant change before committing. The CVS repository is not where changes should be initially submitted for correctness or argued over, that should happen first in the mailing lists and then committed only once something resembling consensus has been reached. This does not mean that you have to ask permission before correcting every obvious syntax error or man page misspelling, simply that you should try to develop a feel for when a proposed change is not quite such a no-brainer and requires some feedback first. People really do not mind sweeping changes if the result is something clearly better than what they had before, they just do not like being surprised by those changes. The very best way of making sure that you are on the right track is to have your code reviewed by one or more other committers. When in doubt, ask for review! Respect existing maintainers if listed. Many parts of FreeBSD are not owned in the sense that any specific individual will jump up and yell if you commit a change to their area, but it still pays to check first. One convention we use is to put a maintainer line in the Makefile for any package or subtree which is being actively maintained by one or more people; see http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/developers-handbook/policies.html for documentation on this. Where sections of code have several maintainers, commits to affected areas by one maintainer need to be reviewed by at least one other maintainer. In cases where the maintainer-ship of something is not clear, you can also look at the CVS logs for the file(s) in question and see if someone has been working recently or predominantly in that area. Other areas of FreeBSD fall under the control of someone who manages an overall category of FreeBSD evolution, such as internationalization or networking. See http://www.FreeBSD.org/doc/en_US.ISO8859-1/articles/contributors/staff-who.html for more information on this. Never touch the repository directly. Ask a Repomeister. This is pretty clear - you are not allowed to make direct modifications to the CVS repository, period. In case of difficulty, ask one of the repository meisters by sending mail to the &a.cvs; and simply wait for them to fix the problem and get back to you. Do not attempt to fix the problem yourself! If you are thinking about putting down a tag or doing a new import of code on a vendor branch, you might also find it useful to ask for advice first. A lot of people get this wrong the first few times and the consequences are expensive in terms of files touched and angry CVSup/CTM folks who are suddenly getting a lot of changes sent over unnecessarily. Any disputed change must be backed out pending resolution of the dispute if requested by a maintainer. Security related changes may override a maintainer's wishes at the Security Officer's discretion. This may be hard to swallow in times of conflict (when each side is convinced that they are in the right, of course) but CVS makes it unnecessary to have an ongoing dispute raging when it is far easier to simply reverse the disputed change, get everyone calmed down again and then try to figure out what is the best way to proceed. If the change turns out to be the best thing after all, it can be easily brought back. If it turns out not to be, then the users did not have to live with the bogus change in the tree while everyone was busily debating its merits. People very very rarely call for back-outs in the repository since discussion generally exposes bad or controversial changes before the commit even happens, but on such rare occasions the back-out should be done without argument so that we can get immediately on to the topic of figuring out whether it was bogus or not. Changes go to &os.current; before &os.stable; unless specifically permitted by the release engineer or unless they are not applicable to &os.current;. Any non-trivial or non-urgent change which is applicable should also be allowed to sit in &os.current; for at least 3 days before merging so that it can be given sufficient testing. The release engineer has the same authority over the &os.stable; branch as outlined in rule #6. This is another do not argue about it issue since it is the release engineer who is ultimately responsible (and gets beaten up) if a change turns out to be bad. Please respect this and give the release engineer your full cooperation when it comes to the &os.stable; branch. The management of &os.stable; may frequently seem to be overly conservative to the casual observer, but also bear in mind the fact that conservatism is supposed to be the hallmark of &os.stable; and different rules apply there than in &os.current;. There is also really no point in having &os.current; be a testing ground if changes are merged over to &os.stable; immediately. Changes need a chance to be tested by the &os.current; developers, so allow some time to elapse before merging unless the &os.stable; fix is critical, time sensitive or so obvious as to make further testing unnecessary (spelling fixes to man pages, obvious bug/typo fixes, etc.) In other words, apply common sense. Changes to the security branches (for example, RELENG_4_5) must be approved by a member of the &a.security-officer;, or in some cases, by a member of the &a.re;. Do not fight in public with other committers; it looks bad. If you must strongly disagree about something, do so only in private. This project has a public image to uphold and that image is very important to all of us, especially if we are to continue to attract new members. There will be occasions when, despite everyone's very best attempts at self-control, tempers are lost and angry words are exchanged. The best thing that can be done in such cases is to minimize the effects of this until everyone has cooled back down. That means that you should not air your angry words in public and you should not forward private correspondence to public mailing lists or aliases. What people say one-to-one is often much less sugar-coated than what they would say in public, and such communications therefore have no place there - they only serve to inflame an already bad situation. If the person sending you a flame-o-gram at least had the grace to send it privately, then have the grace to keep it private yourself. If you feel you are being unfairly treated by another developer, and it is causing you anguish, bring the matter up with core rather than taking it public. Core will do its best to play peace makers and get things back to sanity. In cases where the dispute involves a change to the codebase and the participants do not appear to be reaching an amicable agreement, core may appoint a mutually-agreeable 3rd party to resolve the dispute. All parties involved must then agree to be bound by the decision reached by this 3rd party. Respect all code freezes and read the committers and developers mailing list on a timely basis so you know when a code freeze is in effect. Committing unapproved changes during a code freeze is a really big mistake and committers are expected to keep up-to-date on what is going on before jumping in after a long absence and committing 10 megabytes worth of accumulated stuff. People who abuse this on a regular basis will have their commit privileges suspended until they get back from the FreeBSD Happy Reeducation Camp we run in Greenland. When in doubt on any procedure, ask first! Many mistakes are made because someone is in a hurry and just assumes they know the right way of doing something. If you have not done it before, chances are good that you do not actually know the way we do things and really need to ask first or you are going to completely embarrass yourself in public. There is no shame in asking how in the heck do I do this? We already know you are an intelligent person; otherwise, you would not be a committer. Test your changes before committing them. This may sound obvious, but if it really were so obvious then we probably would not see so many cases of people clearly not doing this. If your changes are to the kernel, make sure you can still compile both GENERIC and LINT. If your changes are anywhere else, make sure you can still make world. If your changes are to a branch, make sure your testing occurs with a machine which is running that code. If you have a change which also may break another architecture, be sure and test on all supported architectures. Currently, this is only the x86 and the Alpha so it is pretty easy to do. If you need to test on the AXP, your account on beast.FreeBSD.org will let you compile and test Alpha binaries/kernels/etc. As other architectures are added to the FreeBSD supported platforms list, the appropriate shared testing resources will be made available. Do not commit to anything under the src/contrib, src/crypto, and src/sys/contrib trees without explicit approval from the respective maintainer(s). The trees mentioned above are for contributed software usually imported onto a vendor branch. Committing something there, even if it does not take the file off the vendor branch, may cause unnecessary headaches for those responsible for maintaining that particular piece of software. Thus, unless you have explicit approval from the maintainer (or you are the maintainer), do not commit there! Please note that this does not mean you should not try to improve the software in question; you are still more than welcome to do so. Ideally, you should submit your patches to the vendor. If your changes are FreeBSD-specific, talk to the maintainer; they may be willing to apply them locally. But whatever you do, do not commit there by yourself! Contact the &a.core; if you wish to take up maintainership of an unmaintained part of the tree. Other Suggestions When committing documentation changes, use a spell checker before committing. For all SGML docs, you should also verify that your formatting directives are correct by running make lint. For all on-line manual pages, run manck (from ports) over the man page to verify all of the cross references and file references are correct and that the man page has all of the appropriate MLINKs installed. Do not mix style fixes with new functionality. A style fix is any change which does not modify the functionality of the code. Mixing the changes obfuscates the functionality change when using cvs diff, which can hide any new bugs. Do not include whitespace changes with content changes in commits to doc/ or www/. The extra clutter in the diffs makes the translators' job much more difficult. Instead, make any style or whitespace changes in separate commits that are clearly labeled as such in the commit message. Deprecating Features When it is necessary to remove functionality from software in the base system the following guidelines should be followed whenever possible: Mention is made in the manual page and possibly the release notes that the option, utility, or interface is deprecated. Use of the deprecated feature generates a warning. The option, utility, or interface is preserved until the next major (point zero) release. The option, utility, or interface is removed and no longer documented. It is now obsolete. It is also generally a good idea to note its removal in the release notes. Ports Specific FAQ Adding a New Port How do I add a new port? First, please read the section about repository copy. The easiest way to add a new port is to use the addport script on freefall. It will add a port from the directory you specify, determining the category automatically from the port Makefile. It will also add an entry to the CVSROOT/modules file and the port's category Makefile. It was written by &a.mharo; and &a.will;, but Will is the current maintainer so please send questions/patches about addport to him. Any other things I need to know when I add a new port? Check the port, preferably to make sure it compiles and packages correctly. This is the recommended sequence: &prompt.root; make install &prompt.root; make package &prompt.root; make deinstall &prompt.root; pkg_add package you built above &prompt.root; make deinstall &prompt.root; make reinstall &prompt.root; make package The Porters Handbook contains more detailed instructions. Use &man.portlint.1; to check the syntax of the port. You do not necessarily have to eliminate all warnings but make sure you have fixed the simple ones. If the port came from a submitter who has not contributed to the project before, add that person's name to the Additional Contributors section of the FreeBSD Contributors List. Close the PR if the port came in as a PR. To close a PR, just do edit-pr PR# on freefall and change the state from open to closed. You will be asked to enter a log message and then you are done. Repository Copies When do we need a repository copy? When you want to add a port that is related to any port that is already in the tree in a separate directory, please send mail to the ports manager asking about it. Here related means it is a different version or a slightly modified version. Examples are print/ghostscript* (different versions) and x11-wm/windowmaker* (English-only and internationalized version). Another example is when a port is moved from one subdirectory to another, or when you want to change the name of a directory because the author(s) renamed their software even though it is a descendant of a port already in a tree. When do we not need a repository copy? When there is no history to preserve. If a port is added into a wrong category and is moved immediately, it suffices to simply cvs remove the old one and addport the new one. What do I need to do? Send mail to the ports manager, who will do a copy from the old location/name to the new location/name. You will then get a notice, at which point you are expected to perform the following: When a port has been repo copied: Upgrade the copied port to the new version (remember to change the PORTNAME so there aren't duplicate ports with the same name). Add the new subdirectory to the SUBDIR listing in the parent directory Makefile. You can run make checksubdirs in the parent directory to check this. If the port changed categories, modify the CATEGORIES line of the port's Makefile accordingly Add the new module entry. When removing a port: Perform a thorough check of the ports collection for any dependencies on the old port location/name, and update them. Running grep on INDEX is not enough because some ports have dependencies enabled by compile-time options. A full grep -r of the ports collection is recommended. Remove the old port, the old SUBDIR entry and the old module entry. After repo moves (rename operations where a port is copied and the old location is removed): Follow the same steps that are outlined in the previous two entries, to activate the new location of the port and remove the old one. Ports Freeze What is a ports freeze? Before a release, it is necessary to restrict commits to the ports tree for a short period of time while the packages and the release itself are being built. This is to ensure consistency among the various parts of the release, and is called the ports freeze. How long is a ports freeze? Usually an hour or two. What does it mean to me? During the ports freeze, you are not allowed to commit anything to the tree without explicit approval from the ports manager. Explicit approval here means either of the following: You asked the ports manager and got a reply saying, Go ahead and commit it. The ports manager sent a mail to you or the mailing lists during the ports freeze pointing out that the port is broken and has to be fixed. Note that you do not have implicit permission to fix a port during the freeze just because it is broken. How do I know when the ports freeze starts? The ports manager will send out warning messages to the &a.ports; and &a.committers; announcing the start of the impending release, usually two or three weeks in advance. The exact starting time will not be determined until a few days before the actual release. This is because the ports freeze has to be synchronized with the release, and it is usually not known until then when exactly the release will be rolled. When the freeze starts, there will be another announcement to the &a.committers;, of course. How do I know when the ports freeze ends? A few hours after the release, the ports manager will send out a mail to the &a.ports; and &a.committers; announcing the end of the ports freeze. Note that the release being cut does not automatically end the freeze. We have to make sure there will not be any last minute snafus that result in an immediate re-rolling of the release. Miscellaneous Questions How do I know if my port is building correctly or not? First, go check http://bento.FreeBSD.org/~asami/errorlogs/. There you will find error logs from the latest package building runs on 3-stable, 4-stable and 5-current. However, just because the port does not show up there does not mean it is building correctly. (One of the dependencies may have failed, for instance.) Here are the relevant directories on bento, so feel free to dig around. /a/asami/portbuild/3/errors error logs from latest 3-stable run /logs all logs from latest 3-stable run /packages packages from latest 3-stable run /bak/errors error logs from last complete 3-stable run /bak/logs all logs from last complete 3-stable run /bak/packages packages from last complete 3-stable run /4/errors error logs from latest 4-stable run /logs all logs from latest 4-stable run /packages packages from latest 4-stable run /bak/errors error logs from last complete 4-stable run /bak/logs all logs from last complete 4-stable run /bak/packages packages from last complete 4-stable run /5/errors error logs from latest 5-current run /logs all logs from latest 5-current run /packages packages from latest 5-current run /bak/errors error logs from last complete 5-current run /bak/logs all logs from last complete 5-current run /bak/packages packages from last complete 5-current run Basically, if the port shows up in packages, or it is in logs but not in errors, it built fine. (The errors directories are what you get from the web page.) I added a new port. Do I need to add it to the INDEX? No. The ports manager will regenerate the INDEX and commit it every few days. Are there any other files I am not allowed to touch? Any file directly under ports/, or any file under a subdirectory that starts with an uppercase letter (Mk/, Tools/, etc.). In particular, the ports manager is very protective of ports/Mk/bsd.port*.mk so do not commit changes to those files unless you want to face his wra(i)th. What is the proper procedure for updating the checksum for a port's distfile when the file changes without a version change? When the checksum for a port's distfile is updated due to the author updating the file without changing the port's revision, the commit message should include a summary of the relevant diffs between the original and new distfile to ensure that the distfile has not been corrupted or maliciously altered. If the current version of the port has been in the ports tree for a while, a copy of the old distfile will usually be available on the ftp servers; otherwise the author or maintainer should be contacted to find out why the distfile has changed. Perks of the Job Unfortunately, there aren't many perks involved with being a committer. Recognition as a competent software engineer is probably the only thing that will be of benefit in the long run. However, there are at least some perks: Direct access to cvsup-master As a committer, you may apply to &a.jdp; for direct access to cvsup-master.FreeBSD.org, providing the public key output from cvpasswd yourusername@FreeBSD.org cvsup-master.FreeBSD.org. Access to cvsup-master should not be over-used as it is a busy machine. A Free DVD Subscription FreeBSD Services Limited offer a free DVD subscription to FreeBSD committers. To take advantage of this offer, go to www.freebsd-services.com and fill out their customer form, making sure that you tick the FreeBSD Committer and free subscription check-boxes. A message will be sent to your FreeBSD.org email address asking for confirmation. Just reply to the mail, quoting the message and updating the Membership Valid field with a Y. You can confirm that the reply was sent successfully by logging in to their site and checking that your Current Status is set to Associated. In addition to the free subscription, committers are also entitled to a 10% discount on all products on the site. A Free 4-CD Set Subscription FreeBSD Mall, Inc. offers a free subscription of the official 4-CD set to all FreeBSD committers. Information about how to obtain your free CD is mailed to developers@FreeBSD.org following each major release. Miscellaneous Questions Why are trivial or cosmetic changes to files on a vendor branch a bad idea? From now on, every new vendor release of that file will need to have patches merged in by hand. From now on, every new vendor release of that file will need to have patches verified by hand. The option does not work very well. Ask &a.obrien; for horror stories. How do I add a new file to a CVS branch? To add a file onto a branch, simply checkout or update to the branch you want to add to and then add the file using cvs add as you normally would. For example, if you wanted to MFC the file src/sys/alpha/include/smp.h from HEAD to RELENG_4 and it does not exist in RELENG_4 yet, you would use the following steps: MFC'ing a New File &prompt.user; cd sys/alpha/include &prompt.user; cvs update -rRELENG_4 cvs update: Updating . U clockvar.h U console.h ... &prompt.user; cvs update -kk -Ap smp.h > smp.h =================================================================== Checking out smp.h RCS: /usr/cvs/src/sys/alpha/include/smp.h,v VERS: 1.1 *************** &prompt.user; cvs add smp.h cvs add: scheduling file `smp.h' for addition on branch `RELENG_4' cvs add: use 'cvs commit' to add this file permanently &prompt.user; cvs commit What meta information should I include in a commit message? As well as including an informative message with each commit you may need to include some additional information as well. This information consists of one or more lines containing the the key word or phrase, a colon, tabs for formatting, and then the additional information. The key words or phrases are: PR: The problem report (if any) which is affected (typically, by being closed) by this commit. Submitted by: The name and e-mail address of the person that submitted the fix; for committers, just the username on the FreeBSD cluster. Reviewed by: The name and e-mail address of the person or people that reviewed the change; for committers, just the username on the FreeBSD cluster. If a patch was submitted to a mailing list for review, and the review was favorable, then just include the list name. Approved by: The name and e-mail address of the person or people that approved the change; for committers, just the username on the FreeBSD cluster. It is customary to get prior approval for a commit if it is to an area of the tree to which you do not usually commit. In addition, during the run up to a new release all commits must be approved by the release engineering team. If these are your first commits then you should have passed them past your mentor first, and you should list your mentor, as in ``username-of-mentor (mentor)''. Obtained from: The name of the project (if any) from which the code was obtained. MFC after: If you wish to receive an e-mail reminder to MFC at a later date, specify the number of days, weeks, or months after which an MFC is planned. Commit log for a commit based on a PR You want to commit a change based on a PR submitted by John Smith containing a patch. The end of the commit message should look something like this. ... PR: foo/12345 Submitted by: John Smith <John.Smith@example.com> Commit log for a commit needing review You want to change the virtual memory system. You have posted patches to the appropriate mailing list (in this case, freebsd-arch) and the changes have been approved. ... Reviewed by: -arch Commit log for a commit needing approval You want to commit a change to a section of the tree with a MAINTAINER assigned. You have collaborated with the listed MAINTAINER, who has told you to go ahead and commit. ... Approved by: abc Where abc is the account name of the person who approved. Commit log for a commit bringing in code from OpenBSD You want to commit some code based on work done in the OpenBSD project. ... Obtained from: OpenBSD Commit log for a change to &os.current; with a planned commit to &os.stable; to follow at a later date. You want to commit some code which will be merged from &os.current; into the &os.stable; branch after two weeks. ... MFC after: 2 weeks Where 2 is the number of days, weeks, or months after which an MFC is planned. The weeks option may be day, days, week, weeks, month, months, or may be left off (in which case, days will be assumed). In some cases you may need to combine some of these. Consider the situation where a user has submitted a PR containing code from the NetBSD project. You are looking at the PR, but it is not an area of the tree you normally work in, so you have decided to get the change reviewed by the arch mailing list. Since the change is complex, you opt to MFC after one month to allow adequate testing. The extra information to include in the commit would look something like PR: foo/54321 Submitted by: John Smith <John.Smith@example.com> Reviewed by: -arch Obtained from: NetBSD MFC after: 1 month How do I access people.FreeBSD.org to put up personal or project information? people.FreeBSD.org is the same as freefall.FreeBSD.org. Just create a public_html directory. Anything you place in that directory will automatically be visible under people.FreeBSD.org.
diff --git a/en_US.ISO8859-1/articles/java-tomcat/article.sgml b/en_US.ISO8859-1/articles/java-tomcat/article.sgml index 43927b007e..c28562cd2c 100644 --- a/en_US.ISO8859-1/articles/java-tomcat/article.sgml +++ b/en_US.ISO8859-1/articles/java-tomcat/article.sgml @@ -1,636 +1,636 @@ %man; ]>
Java and Jakarta Tomcat on FreeBSD Victoria Chan
vkchan@kendryl.net
Hiten Pandya
hiten@uk.FreeBSD.org
2002 Victoria Chan Hiten Pandya $FreeBSD$ This document is presented in hopes of making it easier for anyone that needs to get Java up and running on FreeBSD, with the least amount of aggravation. Plan on spending a whole day on such a project as it will take time to assemble all the pieces and compile them individually, and then as a whole. It also shows how to install the famous Jakarta Tomcat Servlet and JSP container on the FreeBSD operating system.
Introduction The Java programming language was birthed on May 23rd 1995. One would expect that after all this time, Java applications would be easy to install and ready to run from a single package, or port on FreeBSD, thus making it available for the masses. This is not the case, unfortunately, as the Java distribution is held very closely by Sun Microsystems, and prohibits re-distribution. All Java Applets must be compiled from source code, together with the Java Development Kit from Sun Microsystems. All these ingredients must be blended together in the right order, assembled, and compiled by the end user. With such distribution philosophies at heart, it is my opinion that Java will always be developer or hacker use only. I certainly found this to be true when I needed to serve up some .jsp pages for a client on my web server, and needed to get www/jakarta-tomcat to work with www/apache13 on my FreeBSD system. The Tomcat portion of the install is very straight forward, but the difficulty I had was getting Java Development Kit up and - running for FreeBSD 4.x, as Sun Microsystems only supplies + running for FreeBSD 4.X, as Sun Microsystems only supplies Binaries for Linux, Solaris, and Windows NT. This means that I had to compile my own JDK for FreeBSD. I began by searching for documentation on the Internet. I quickly found that there is more source code than I need along with patches to the source code, but very little documentation of what to do after obtaining everything. In this article, you will find how to install the Java Development Kit for FreeBSD, and how to get up and running with Tomcat. A section is also provided for further reading. The Java Environment Ensure that you have the current ports collection as make it will fail if it attempts to build older source. You can upgrade your entire ports collection by using CVSup. See for more information. You can also download the ports you need manually from to get you going. You will need the Linux Emulation (Linux-ABI) enabled in your kernel configuration. Simply add the following option to your kernel configuration file and recompile it. Instructions for building a kernel can be found in the FreeBSD Handbook. options COMPAT_LINUX The above option will add Linux-ABI support to your kernel, when it is recompiled. The list of dependencies below, are required to be installed manually in a certain order. Dependencies that are automatically downloaded are not listed here. java/jdk13 java/linux-jdk13 archivers/gtar archivers/bzip2 archivers/unzip archivers/zip You will need to get the following: Download bsd-jdk131-patches-5.tar.gz from and place it under /usr/ports/distfiles. Next get out your web browser and head on over to and find SDK downloads. Click on the continue button below GNUZIP Tar Shell Script. Be sure you read every word of the license page before you click on the Accept button! You will be brought to a page titled Download Java(TM) 2 SDK, Standard Edition 1.3.1_02. Scroll to the bottom and click on the HTTP download button. When the File Download box comes up, be sure to click on the Open button rather than the Save button. You will be presented with another File Download box - this time choose Save and you will be able to save j2sdk-1_3_1_02-linux-i386.bin. Place it in /usr/ports/distfiles. Go to . In the table under Produce Description, named Java 2 SDK 1.3.1, go to the right-hand cell and click download. You will be taken to the Sign On page, where you must sign in if you already have an account, or register for access. Once you have signed on, you will be taken to the Legal page, where you must accept the license agreement; scroll down (reading the license) and click on the Continue button. Next page, is the Receipt page. This is where you will save you order number. You will be able to choose the location that is nearest to you. Click on Java 2 SDK, Standard Edition, version 1.3.1. Save the j2sdk-1_3_1-src.tar.gz to the /usr/ports/distfiles/ directory. It is very important for you to read the License Agreement which has been issued by Sun Microsystems Corp. There are several restrictions in place on the use of Java, which you must address. The FreeBSD Project does not take any responsibilities for your actions. Do not discard any of the downloaded files, as they will be needed for building some of the native ports for FreeBSD, which are discussed later on. Now that you have assembled all the source files and ports, you need to start by building java/linux-jdk13: &prompt.root; cd /usr/ports/archivers/gtar; make all install clean &prompt.root; cd /usr/ports/archivers/unzip; make all install clean &prompt.root; cd /usr/ports/archivers/zip; make all install clean And finally: &prompt.root; cd /usr/ports/java/linux-jdk13 &prompt.root; make all install clean Once you have built java/linux-jdk13, you need to test it, to make sure it works as intended. To do that: &prompt.root; cd /usr/local/linux-jdk1.3.1/bin &prompt.root; ./java -version The output of the above command should be as follows: java version "1.3.1_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_02-b02) Classic VM (build 1.3.1_02-b02, green threads, nojit) If you did not get the correct response, you need to: &prompt.root; cd /usr/ports/java/linux-jdk13 &prompt.root; make deinstall And make sure that /usr/local does not contain a linux-jdk1.3.1 directory. If you find a fragment of the directory, delete it. Repeat the build and install process for java/linux-jdk13. To make the native Java Development Kit 1.3.1 for FreeBSD, do the following: Make sure you have the j2sdk-1_3_1-src.tar.gz file in your /usr/ports/distfiles. This file is needed for applying the patch-sets discussed below. You will need to download the patch set for building the port. The patch-set file is called bsd-jdk131-patches-6.tar.gz. You should also make sure the integrity of the files by matching it with the following MD5 checksum. The patch-set is called Patch-set 6. MD5 (bsd-jdk131-patches-6.tar.gz) = 9cade10b81d6034fdd2176bef32bdbf9 The patch-set is available from: The last procedure discussed above (building the native jdk) will take some time. Jakarta Tomcat Setup Overview Java is becoming an even more popular for making diverse and scalable platform independent solutions. One of the most growing needs of Java is in the ASP (Application Service Provider) market. Java serves as the perfect solution for these types of markets, with the following advantages: Platform Independence Industry Wide Commitment Scalability Reliable Performance Distributed, Multi-threaded, Secure etc. A very important and growing technology which has emerged from Java is JSP (JavaServer Pages). JSP (JavaServer Pages) is a server-side technology introduced by Sun Microsystems Corp., which provides a quick simple way to generate dynamic content from within HTML pages. It uses XML tags along with Java scriptlets to encapsulate and separate the logic from the design and display. When a JSP page is invoked, it is dynamically converted into a Servlet and processed by the server to produce the resulting HTML/XML page for the client. When JSP is used in conjunction with JavaBeans, it is possible to produce very diverse and scalable applications, which may be combined with the strength and performance of FreeBSD. Tomcat is an open-source implementation of the Java Servlets and JavaServer Pages technologies, developed under the Jakarta project at the Apache Software Foundation. Tomcat implements a new Servlet framework (called Catalina) that is based on completely new architecture with the Servlet 2.3 and JSP 1.2 specifications. It includes many additional features that make it a useful platform for developing and deploying web applications and web services. In a nutshell, Tomcat is an application server written in 100% Pure Java. Tomcat is used for many purposes, and is not limited to Application Servers. It provides an open platform to develop extensible web and content management services. When Tomcat is used with an optimized FreeBSD system, it can provide highly reliable and fast pacing services. Please refer to the section for more information on Tomcat and JSP. The next section will demonstrate how to build the Tomcat Environment for FreeBSD. The version of Tomcat used in this guide is 4.0.3. This version contains major bug fixes, and the following updates/changes: JSP 1.2 Specification Java Servlet 2.3 Specification Full backward compatibility with the Java Servlet 2.2 and JSP 1.1 Specification The Tomcat environment for FreeBSD It is very simple to install Tomcat on a FreeBSD machine, after setting up the necessary Java environment, which we have previously completed. In-order to setup Tomcat on FreeBSD, follow the below procedure: Follow the above steps to setup the necessary Java environment. Set an environment variable JAVA_HOME which, points to the directory where you have installed the JDK (the below example points to a native build of the JDK): &prompt.root; setenv JAVA_HOME /usr/local/jdk1.3.1 (for C Shells) &prompt.root; export JAVA_HOME=/usr/local/jdk1.3.1 (for Bourne Shells) This environment variable should be made permanent by adding it into either .profile or .cshrc, depending on the shell you are using. This variable is very crucial for the functioning of all the Java based programs, including Tomcat itself. Download the Tomcat binary distribution from the Jakarta website, which is located at . The file to download is called jakarta-tomcat-4.0.3.tar.gz. The compressed and archived file we downloaded in the previous step uses special GNU Extensions. In-order to untar and uncompress the file, we will need to install GNU Tar (archivers/gtar), by doing the following: &prompt.root; cd /usr/ports/archivers & & make all install clean Un-tar and Un-compress the jakarta-tomcat-4.0.3.tar.gz file into the /usr/local directory and rename the directory to tomcat-4.0 for ease of reference: &prompt.root; cd /usr/local &prompt.root; gtar zxvf jakarta-tomcat-4.0.3.tar.gz &prompt.root; ls jakarta* jakarta-tomcat-4.0.3 &prompt.root; mv jakarta-tomcat-4.0.3 tomcat-4.0 You can remove the jakarta-tomcat-4.0.3.tar.gz at your preference. Installation by using the source code is currently out of scope for this document. Please refer to the following files for addition information on building from source, available from your Tomcat distribution directory: /usr/local/tomcat-4.0/README.txt /usr/local/tomcat-4.0/BUILDING.txt Operating Tomcat - Basics Now that we have finished installing Tomcat. The following example shows how to start the Tomcat server: &prompt.root; cd /usr/local/tomcat-4.0/bin &prompt.root; ./startup.sh (for starting Tomcat) You can test if your Tomcat server has started by visiting the following URL: http://127.0.0.1:8080 or http://localhost:8080. To stop Tomcat: &prompt.root; cd /usr/local/tomcat-4.0/bin &prompt.root; ./shutdown.sh (for stopping Tomcat) The startup.sh and shutdown.sh are frontends to the catalina.sh executable script in the same directory; if you would like to start Tomcat automatically at boot-time run: &prompt.root; cd /usr/local/etc/rc.d &prompt.root; ln -s /usr/local/tomcat-4.0/bin/catalina.sh Edit the catalina.sh, and add the following at the beginning of the file (after the comment box): JAVA_HOME=/usr/local/jdk1.3.1 If your port 8080 is occupied by some other service, you can change it by editing the server.xml in your Tomcat's conf/ directory. In the example below, the port will be changed to 80, assuming there is no service running on that port. &prompt.root; cd /usr/local/tomcat-4.0/conf &prompt.root; fgrep -n 8080 server.xml ~65: By default, a non-SSL HTTP/1.1 Connector is established on port 8080. ~89: port="8080" minProcessors="5" maxProcessors="75" &prompt.root; cat server.xml | sed s/8080/80/ > server.xml Reference The FreeBSD Java Project JavaSoft. Home of Java The Sun Community Source Licensing for Java Jakarta Tomcat Homepage J2SE Documentation FreeBSD Ports - Java Section Conclusion Finally, we are at the end of the article and have a working version of Tomcat. We hope that you have learned the basics of installing and building the Java Development Kit on FreeBSD, along with installation of the Tomcat binary distribution application server released by the Apache Software Foundation. The section contains pointers to additional resources on this topic, some which are in print, some which are on the World Wide Web, or both. The most important thing is drive space. I suggest having 700MB or more free space in /usr. I hope this article has helped you in some small way. For questions, comments, compliments, or rants, please direct them to Victoria Chan.
diff --git a/en_US.ISO8859-1/articles/releng/branches.ascii b/en_US.ISO8859-1/articles/releng/branches.ascii index 1f3f198a58..531bed46bf 100644 --- a/en_US.ISO8859-1/articles/releng/branches.ascii +++ b/en_US.ISO8859-1/articles/releng/branches.ascii @@ -1,30 +1,30 @@ $FreeBSD$ | FreeBSD Development Branches +--------------+ | 3.0-RELEASE | | | +--------------+ | RELENG_3 H ______|____ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ + - - - - - - + E / \ | | | | | | | | | | | | -A |3.1-RELEASE|-|3.2R |-|3.3R |-|3.4R |-|3.5R |-| 3.5.1R|---| 3.x-STABLE | +A |3.1-RELEASE|-|3.2R |-|3.3R |-|3.4R |-|3.5R |-| 3.5.1R|---| 3.X-STABLE | D \___________/ |_ _ _| |_ _ _| |_ _ _| |_ _ _| |_ _ _ _| | | | + - - - - - - + | + - - - - - - - + | 4.0-CURRENT | | | + - - - - - - - + | RELENG_4 _____|_____ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ + - - - - - - + / \ | | | | | | | | | | | | - |4.0-RELEASE|-|4.1R |-|4.1.1R |-|4.2R |-|4.3R |-|4.4R |---| 4.x-STABLE | + |4.0-RELEASE|-|4.1R |-|4.1.1R |-|4.2R |-|4.3R |-|4.4R |---| 4.X-STABLE | \___________/ |_ _ _| |_ _ _ _| |_ _ _| |_ _ _| |_ _ _| | | | ___| |__ + - - - - - - + | _ _ _ | _ _ _ _ _ |_ _ _ __ | | | | | + - - - - - - - + | RELENG_4_3 | | RELENG_4_4 | | 5.0-CURRENT | |_ _ _ _ _ _ _| |_ _ _ _ _ _ _| | | + - - - - - - - + diff --git a/en_US.ISO8859-1/articles/vm-design/article.sgml b/en_US.ISO8859-1/articles/vm-design/article.sgml index 9e992c6039..f4f2487d4b 100644 --- a/en_US.ISO8859-1/articles/vm-design/article.sgml +++ b/en_US.ISO8859-1/articles/vm-design/article.sgml @@ -1,838 +1,838 @@ %man; ]>
Design elements of the FreeBSD VM system Matthew Dillon
dillon@apollo.backplane.com
The title is really just a fancy way of saying that I am going to attempt to describe the whole VM enchilada, hopefully in a way that everyone can follow. For the last year I have concentrated on a number of major kernel subsystems within FreeBSD, with the VM and Swap subsystems being the most interesting and NFS being a necessary chore. I rewrote only small portions of the code. In the VM arena the only major rewrite I have done is to the swap subsystem. Most of my work was cleanup and maintenance, with only moderate code rewriting and no major algorithmic adjustments within the VM subsystem. The bulk of the VM subsystem's theoretical base remains unchanged and a lot of the credit for the modernization effort in the last few years belongs to John Dyson and David Greenman. Not being a historian like Kirk I will not attempt to tag all the various features with peoples names, since I will invariably get it wrong. This article was originally published in the January 2000 issue of DaemonNews. This version of the article may include updates from Matt and other authors to reflect changes in FreeBSD's VM implementation.
Introduction Before moving along to the actual design let's spend a little time on the necessity of maintaining and modernizing any long-living codebase. In the programming world, algorithms tend to be more important than code and it is precisely due to BSD's academic roots that a great deal of attention was paid to algorithm design from the beginning. More attention paid to the design generally leads to a clean and flexible codebase that can be fairly easily modified, extended, or replaced over time. While BSD is considered an old operating system by some people, those of us who work on it tend to view it more as a mature codebase which has various components modified, extended, or replaced with modern code. It has evolved, and FreeBSD is at the bleeding edge no matter how old some of the code might be. This is an important distinction to make and one that is unfortunately lost to many people. The biggest error a programmer can make is to not learn from history, and this is precisely the error that many other modern operating systems have made. NT is the best example of this, and the consequences have been dire. Linux also makes this mistake to some degree—enough that we BSD folk can make small jokes about it every once in a while, anyway. Linux's problem is simply one of a lack of experience and history to compare ideas against, a problem that is easily and rapidly being addressed by the Linux community in the same way it has been addressed in the BSD community—by continuous code development. The NT folk, on the other hand, repeatedly make the same mistakes solved by Unix decades ago and then spend years fixing them. Over and over again. They have a severe case of not designed here and we are always right because our marketing department says so. I have little tolerance for anyone who cannot learn from history. Much of the apparent complexity of the FreeBSD design, especially in the VM/Swap subsystem, is a direct result of having to solve serious performance issues that occur under various conditions. These issues are not due to bad algorithmic design but instead rise from environmental factors. In any direct comparison between platforms, these issues become most apparent when system resources begin to get stressed. As I describe FreeBSD's VM/Swap subsystem the reader should always keep two points in mind. First, the most important aspect of performance design is what is known as Optimizing the Critical Path. It is often the case that performance optimizations add a little bloat to the code in order to make the critical path perform better. Second, a solid, generalized design outperforms a heavily-optimized design over the long run. While a generalized design may end up being slower than an heavily-optimized design when they are first implemented, the generalized design tends to be easier to adapt to changing conditions and the heavily-optimized design winds up having to be thrown away. Any codebase that will survive and be maintainable for years must therefore be designed properly from the beginning even if it costs some performance. Twenty years ago people were still arguing that programming in assembly was better than programming in a high-level language because it produced code that was ten times as fast. Today, the fallibility of that argument is obvious—as are the parallels to algorithmic design and code generalization. VM Objects The best way to begin describing the FreeBSD VM system is to look at it from the perspective of a user-level process. Each user process sees a single, private, contiguous VM address space containing several types of memory objects. These objects have various characteristics. Program code and program data are effectively a single memory-mapped file (the binary file being run), but program code is read-only while program data is copy-on-write. Program BSS is just memory allocated and filled with zeros on demand, called demand zero page fill. Arbitrary files can be memory-mapped into the address space as well, which is how the shared library mechanism works. Such mappings can require modifications to remain private to the process making them. The fork system call adds an entirely new dimension to the VM management problem on top of the complexity already given. A program binary data page (which is a basic copy-on-write page) illustrates the complexity. A program binary contains a preinitialized data section which is initially mapped directly from the program file. When a program is loaded into a process's VM space, this area is initially memory-mapped and backed by the program binary itself, allowing the VM system to free/reuse the page and later load it back in from the binary. The moment a process modifies this data, however, the VM system must make a private copy of the page for that process. Since the private copy has been modified, the VM system may no longer free it, because there is no longer any way to restore it later on. You will notice immediately that what was originally a simple file mapping has become much more complex. Data may be modified on a page-by-page basis whereas the file mapping encompasses many pages at once. The complexity further increases when a process forks. When a process forks, the result is two processes—each with their own private address spaces, including any modifications made by the original process prior to the call to fork(). It would be silly for the VM system to make a complete copy of the data at the time of the fork() because it is quite possible that at least one of the two processes will only need to read from that page from then on, allowing the original page to continue to be used. What was a private page is made copy-on-write again, since each process (parent and child) expects their own personal post-fork modifications to remain private to themselves and not effect the other. FreeBSD manages all of this with a layered VM Object model. The original binary program file winds up being the lowest VM Object layer. A copy-on-write layer is pushed on top of that to hold those pages which had to be copied from the original file. If the program modifies a data page belonging to the original file the VM system takes a fault and makes a copy of the page in the higher layer. When a process forks, additional VM Object layers are pushed on. This might make a little more sense with a fairly basic example. A fork() is a common operation for any *BSD system, so this example will consider a program that starts up, and forks. When the process starts, the VM system creates an object layer, let's call this A: +---------------+ | A | +---------------+ A picture A represents the file—pages may be paged in and out of the file's physical media as necessary. Paging in from the disk is reasonable for a program, but we really do not want to page back out and overwrite the executable. The VM system therefore creates a second layer, B, that will be physically backed by swap space: +---------------+ | B | +---------------+ | A | +---------------+ On the first write to a page after this, a new page is created in B, and its contents are initialized from A. All pages in B can be paged in or out to a swap device. When the program forks, the VM system creates two new object layers—C1 for the parent, and C2 for the child—that rest on top of B: +-------+-------+ | C1 | C2 | +-------+-------+ | B | +---------------+ | A | +---------------+ In this case, let's say a page in B is modified by the original parent process. The process will take a copy-on-write fault and duplicate the page in C1, leaving the original page in B untouched. Now, let's say the same page in B is modified by the child process. The process will take a copy-on-write fault and duplicate the page in C2. The original page in B is now completely hidden since both C1 and C2 have a copy and B could theoretically be destroyed if it does not represent a real file). However, this sort of optimization is not trivial to make because it is so fine-grained. FreeBSD does not make this optimization. Now, suppose (as is often the case) that the child process does an exec(). Its current address space is usually replaced by a new address space representing a new file. In this case, the C2 layer is destroyed: +-------+ | C1 | +-------+-------+ | B | +---------------+ | A | +---------------+ In this case, the number of children of B drops to one, and all accesses to B now go through C1. This means that B and C1 can be collapsed together. Any pages in B that also exist in C1 are deleted from B during the collapse. Thus, even though the optimization in the previous step could not be made, we can recover the dead pages when either of the processes exit or exec(). This model creates a number of potential problems. The first is that you can wind up with a relatively deep stack of layered VM Objects which can cost scanning time and memory when you take a fault. Deep layering can occur when processes fork and then fork again (either parent or child). The second problem is that you can wind up with dead, inaccessible pages deep in the stack of VM Objects. In our last example if both the parent and child processes modify the same page, they both get their own private copies of the page and the original page in B is no longer accessible by anyone. That page in B can be freed. FreeBSD solves the deep layering problem with a special optimization called the All Shadowed Case. This case occurs if either C1 or C2 take sufficient COW faults to completely shadow all pages in B. Lets say that C1 achieves this. C1 can now bypass B entirely, so rather then have C1->B->A and C2->B->A we now have C1->A and C2->B->A. But look what also happened—now B has only one reference (C2), so we can collapse B and C2 together. The end result is that B is deleted entirely and we have C1->A and C2->A. It is often the case that B will contain a large number of pages and neither C1 nor C2 will be able to completely overshadow it. If we fork again and create a set of D layers, however, it is much more likely that one of the D layers will eventually be able to completely overshadow the much smaller dataset represented by C1 or C2. The same optimization will work at any point in the graph and the grand result of this is that even on a heavily forked machine VM Object stacks tend to not get much deeper then 4. This is true of both the parent and the children and true whether the parent is doing the forking or whether the children cascade forks. The dead page problem still exists in the case where C1 or C2 do not completely overshadow B. Due to our other optimizations this case does not represent much of a problem and we simply allow the pages to be dead. If the system runs low on memory it will swap them out, eating a little swap, but that is it. The advantage to the VM Object model is that fork() is extremely fast, since no real data copying need take place. The disadvantage is that you can build a relatively complex VM Object layering that slows page fault handling down a little, and you spend memory managing the VM Object structures. The optimizations FreeBSD makes proves to reduce the problems enough that they can be ignored, leaving no real disadvantage. SWAP Layers Private data pages are initially either copy-on-write or zero-fill pages. When a change, and therefore a copy, is made, the original backing object (usually a file) can no longer be used to save a copy of the page when the VM system needs to reuse it for other purposes. This is where SWAP comes in. SWAP is allocated to create backing store for memory that does not otherwise have it. FreeBSD allocates the swap management structure for a VM Object only when it is actually needed. However, the swap management structure has had problems historically. - Under FreeBSD 3.x the swap management structure preallocates an + Under FreeBSD 3.X the swap management structure preallocates an array that encompasses the entire object requiring swap backing store—even if only a few pages of that object are swap-backed. This creates a kernel memory fragmentation problem when large objects are mapped, or processes with large runsizes (RSS) fork. Also, in order to keep track of swap space, a list of holes is kept in kernel memory, and this tends to get severely fragmented as well. Since the list of holes is a linear list, the swap allocation and freeing performance is a non-optimal O(n)-per-page. It also requires kernel memory allocations to take place during the swap freeing process, and that creates low memory deadlock problems. The problem is further exacerbated by holes created due to the interleaving algorithm. Also, the swap block map can become fragmented fairly easily resulting in non-contiguous allocations. Kernel memory must also be allocated on the fly for additional swap management structures when a swapout occurs. It is evident that there was plenty of room for improvement. - For FreeBSD 4.x, I completely rewrote the swap subsystem. With this + For FreeBSD 4.X, I completely rewrote the swap subsystem. With this rewrite, swap management structures are allocated through a hash table rather than a linear array giving them a fixed allocation size and much finer granularity. Rather then using a linearly linked list to keep track of swap space reservations, it now uses a bitmap of swap blocks arranged in a radix tree structure with free-space hinting in the radix node structures. This effectively makes swap allocation and freeing an O(1) operation. The entire radix tree bitmap is also preallocated in order to avoid having to allocate kernel memory during critical low memory swapping operations. After all, the system tends to swap when it is low on memory so we should avoid allocating kernel memory at such times in order to avoid potential deadlocks. Finally, to reduce fragmentation the radix tree is capable of allocating large contiguous chunks at once, skipping over smaller fragmented chunks. I did not take the final step of having an allocating hint pointer that would trundle through a portion of swap as allocations were made in order to further guarantee contiguous allocations or at least locality of reference, but I ensured that such an addition could be made. When to free a page Since the VM system uses all available memory for disk caching, there are usually very few truly-free pages. The VM system depends on being able to properly choose pages which are not in use to reuse for new allocations. Selecting the optimal pages to free is possibly the single-most important function any VM system can perform because if it makes a poor selection, the VM system may be forced to unnecessarily retrieve pages from disk, seriously degrading system performance. How much overhead are we willing to suffer in the critical path to avoid freeing the wrong page? Each wrong choice we make will cost us hundreds of thousands of CPU cycles and a noticeable stall of the affected processes, so we are willing to endure a significant amount of overhead in order to be sure that the right page is chosen. This is why FreeBSD tends to outperform other systems when memory resources become stressed. The free page determination algorithm is built upon a history of the use of memory pages. To acquire this history, the system takes advantage of a page-used bit feature that most hardware page tables have. In any case, the page-used bit is cleared and at some later point the VM system comes across the page again and sees that the page-used bit has been set. This indicates that the page is still being actively used. If the bit is still clear it is an indication that the page is not being actively used. By testing this bit periodically, a use history (in the form of a counter) for the physical page is developed. When the VM system later needs to free up some pages, checking this history becomes the cornerstone of determining the best candidate page to reuse. What if the hardware has no page-used bit? For those platforms that do not have this feature, the system actually emulates a page-used bit. It unmaps or protects a page, forcing a page fault if the page is accessed again. When the page fault is taken, the system simply marks the page as having been used and unprotects the page so that it may be used. While taking such page faults just to determine if a page is being used appears to be an expensive proposition, it is much less expensive than reusing the page for some other purpose only to find that a process needs it back and then have to go to disk. FreeBSD makes use of several page queues to further refine the selection of pages to reuse as well as to determine when dirty pages must be flushed to their backing store. Since page tables are dynamic entities under FreeBSD, it costs virtually nothing to unmap a page from the address space of any processes using it. When a page candidate has been chosen based on the page-use counter, this is precisely what is done. The system must make a distinction between clean pages which can theoretically be freed up at any time, and dirty pages which must first be written to their backing store before being reusable. When a page candidate has been found it is moved to the inactive queue if it is dirty, or the cache queue if it is clean. A separate algorithm based on the dirty-to-clean page ratio determines when dirty pages in the inactive queue must be flushed to disk. Once this is accomplished, the flushed pages are moved from the inactive queue to the cache queue. At this point, pages in the cache queue can still be reactivated by a VM fault at relatively low cost. However, pages in the cache queue are considered to be immediately freeable and will be reused in an LRU (least-recently used) fashion when the system needs to allocate new memory. It is important to note that the FreeBSD VM system attempts to separate clean and dirty pages for the express reason of avoiding unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does it move pages between the various page queues gratuitously when the memory subsystem is not being stressed. This is why you will see some systems with very low cache queue counts and high active queue counts when doing a systat -vm command. As the VM system becomes more stressed, it makes a greater effort to maintain the various page queues at the levels determined to be the most effective. An urban myth has circulated for years that Linux did a better job avoiding swapouts than FreeBSD, but this in fact is not true. What was actually occurring was that FreeBSD was proactively paging out unused pages in order to make room for more disk cache while Linux was keeping unused pages in core and leaving less memory available for cache and process pages. I do not know whether this is still true today. Pre-Faulting and Zeroing Optimizations Taking a VM fault is not expensive if the underlying page is already in core and can simply be mapped into the process, but it can become expensive if you take a whole lot of them on a regular basis. A good example of this is running a program such as &man.ls.1; or &man.ps.1; over and over again. If the program binary is mapped into memory but not mapped into the page table, then all the pages that will be accessed by the program will have to be faulted in every time the program is run. This is unnecessary when the pages in question are already in the VM Cache, so FreeBSD will attempt to pre-populate a process's page tables with those pages that are already in the VM Cache. One thing that FreeBSD does not yet do is pre-copy-on-write certain pages on exec. For example, if you run the &man.ls.1; program while running vmstat 1 you will notice that it always takes a certain number of page faults, even when you run it over and over again. These are zero-fill faults, not program code faults (which were pre-faulted in already). Pre-copying pages on exec or fork is an area that could use more study. A large percentage of page faults that occur are zero-fill faults. You can usually see this by observing the vmstat -s output. These occur when a process accesses pages in its BSS area. The BSS area is expected to be initially zero but the VM system does not bother to allocate any memory at all until the process actually accesses it. When a fault occurs the VM system must not only allocate a new page, it must zero it as well. To optimize the zeroing operation the VM system has the ability to pre-zero pages and mark them as such, and to request pre-zeroed pages when zero-fill faults occur. The pre-zeroing occurs whenever the CPU is idle but the number of pages the system pre-zeros is limited in order to avoid blowing away the memory caches. This is an excellent example of adding complexity to the VM system in order to optimize the critical path. Page Table Optimizations The page table optimizations make up the most contentious part of the FreeBSD VM design and they have shown some strain with the advent of serious use of mmap(). I think this is actually a feature of most BSDs though I am not sure when it was first introduced. There are two major optimizations. The first is that hardware page tables do not contain persistent state but instead can be thrown away at any time with only a minor amount of management overhead. The second is that every active page table entry in the system has a governing pv_entry structure which is tied into the vm_page structure. FreeBSD can simply iterate through those mappings that are known to exist while Linux must check all page tables that might contain a specific mapping to see if it does, which can achieve O(n^2) overhead in certain situations. It is because of this that FreeBSD tends to make better choices on which pages to reuse or swap when memory is stressed, giving it better performance under load. However, FreeBSD requires kernel tuning to accommodate large-shared-address-space situations such as those that can occur in a news system because it may run out of pv_entry structures. Both Linux and FreeBSD need work in this area. FreeBSD is trying to maximize the advantage of a potentially sparse active-mapping model (not all processes need to map all pages of a shared library, for example), whereas Linux is trying to simplify its algorithms. FreeBSD generally has the performance advantage here at the cost of wasting a little extra memory, but FreeBSD breaks down in the case where a large file is massively shared across hundreds of processes. Linux, on the other hand, breaks down in the case where many processes are sparsely-mapping the same shared library and also runs non-optimally when trying to determine whether a page can be reused or not. Page Coloring We will end with the page coloring optimizations. Page coloring is a performance optimization designed to ensure that accesses to contiguous pages in virtual memory make the best use of the processor cache. In ancient times (i.e. 10+ years ago) processor caches tended to map virtual memory rather than physical memory. This led to a huge number of problems including having to clear the cache on every context switch in some cases, and problems with data aliasing in the cache. Modern processor caches map physical memory precisely to solve those problems. This means that two side-by-side pages in a processes address space may not correspond to two side-by-side pages in the cache. In fact, if you are not careful side-by-side pages in virtual memory could wind up using the same page in the processor cache—leading to cacheable data being thrown away prematurely and reducing CPU performance. This is true even with multi-way set-associative caches (though the effect is mitigated somewhat). FreeBSD's memory allocation code implements page coloring optimizations, which means that the memory allocation code will attempt to locate free pages that are contiguous from the point of view of the cache. For example, if page 16 of physical memory is assigned to page 0 of a process's virtual memory and the cache can hold 4 pages, the page coloring code will not assign page 20 of physical memory to page 1 of a process's virtual memory. It would, instead, assign page 21 of physical memory. The page coloring code attempts to avoid assigning page 20 because this maps over the same cache memory as page 16 and would result in non-optimal caching. This code adds a significant amount of complexity to the VM memory allocation subsystem as you can well imagine, but the result is well worth the effort. Page Coloring makes VM memory as deterministic as physical memory in regards to cache performance. Conclusion Virtual memory in modern operating systems must address a number of different issues efficiently and for many different usage patterns. The modular and algorithmic approach that BSD has historically taken allows us to study and understand the current implementation as well as relatively cleanly replace large sections of the code. There have been a number of improvements to the FreeBSD VM system in the last several years, and work is ongoing. Bonus QA session by Allen Briggs <email>briggs@ninthwonder.com</email> What is the interleaving algorithm that you - refer to in your listing of the ills of the FreeBSD 3.x swap + refer to in your listing of the ills of the FreeBSD 3.X swap arrangements? FreeBSD uses a fixed swap interleave which defaults to 4. This means that FreeBSD reserves space for four swap areas even if you only have one, two, or three. Since swap is interleaved the linear address space representing the four swap areas will be fragmented if you do not actually have four swap areas. For example, if you have two swap areas A and B FreeBSD's address space representation for that swap area will be interleaved in blocks of 16 pages: A B C D A B C D A B C D A B C D - FreeBSD 3.x uses a sequential list of free + FreeBSD 3.X uses a sequential list of free regions approach to accounting for the free swap areas. The idea is that large blocks of free linear space can be represented with a single list node (kern/subr_rlist.c). But due to the fragmentation the sequential list winds up being insanely fragmented. In the above example, completely unused swap will have A and B shown as free and C and D shown as all allocated. Each A-B sequence requires a list node to account for because C and D are holes, so the list node cannot be combined with the next A-B sequence. Why do we interleave our swap space instead of just tack swap areas onto the end and do something fancier? Because it is a whole lot easier to allocate linear swaths of an address space and have the result automatically be interleaved across multiple disks than it is to try to put that sophistication elsewhere. The fragmentation causes other problems. Being a linear list - under 3.x, and having such a huge amount of inherent + under 3.X, and having such a huge amount of inherent fragmentation, allocating and freeing swap winds up being an O(N) algorithm instead of an O(1) algorithm. Combined with other factors (heavy swapping) and you start getting into O(N^2) and - O(N^3) levels of overhead, which is bad. The 3.x system may also + O(N^3) levels of overhead, which is bad. The 3.X system may also need to allocate KVM during a swap operation to create a new list node which can lead to a deadlock if the system is trying to pageout pages in a low-memory situation. - Under 4.x we do not use a sequential list. Instead we use a + Under 4.X we do not use a sequential list. Instead we use a radix tree and bitmaps of swap blocks rather than ranged list nodes. We take the hit of preallocating all the bitmaps required for the entire swap area up front but it winds up wasting less memory due to the use of a bitmap (one bit per block) instead of a linked list of nodes. The use of a radix tree instead of a sequential list gives us nearly O(1) performance no matter how fragmented the tree becomes. I do not get the following:
It is important to note that the FreeBSD VM system attempts to separate clean and dirty pages for the express reason of avoiding unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does it move pages between the various page queues gratuitously when the memory subsystem is not being stressed. This is why you will see some systems with very low cache queue counts and high active queue counts when doing a systat -vm command.
How is the separation of clean and dirty (inactive) pages related to the situation where you see low cache queue counts and high active queue counts in systat -vm? Do the systat stats roll the active and dirty pages together for the active queue count?
Yes, that is confusing. The relationship is goal verses reality. Our goal is to separate the pages but the reality is that if we are not in a memory crunch, we do not really have to. What this means is that FreeBSD will not try very hard to separate out dirty pages (inactive queue) from clean pages (cache queue) when the system is not being stressed, nor will it try to deactivate pages (active queue -> inactive queue) when the system is not being stressed, even if they are not being used.
In the &man.ls.1; / vmstat 1 example, would not some of the page faults be data page faults (COW from executable file to private page)? I.e., I would expect the page faults to be some zero-fill and some program data. Or are you implying that FreeBSD does do pre-COW for the program data? A COW fault can be either zero-fill or program-data. The mechanism is the same either way because the backing program-data is almost certainly already in the cache. I am indeed lumping the two together. FreeBSD does not pre-COW program data or zero-fill, but it does pre-map pages that exist in its cache. In your section on page table optimizations, can you give a little more detail about pv_entry and vm_page (or should vm_page be vm_pmap—as in 4.4, cf. pp. 180-181 of McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of operation/reaction would require scanning the mappings? How does Linux do in the case where FreeBSD breaks down (sharing a large file mapping over many processes)? A vm_page represents an (object,index#) tuple. A pv_entry represents a hardware page table entry (pte). If you have five processes sharing the same physical page, and three of those processes's page tables actually map the page, that page will be represented by a single vm_page structure and three pv_entry structures. pv_entry structures only represent pages mapped by the MMU (one pv_entry represents one pte). This means that when we need to remove all hardware references to a vm_page (in order to reuse the page for something else, page it out, clear it, dirty it, and so forth) we can simply scan the linked list of pv_entry's associated with that vm_page to remove or modify the pte's from their page tables. Under Linux there is no such linked list. In order to remove all the hardware page table mappings for a vm_page linux must index into every VM object that might have mapped the page. For example, if you have 50 processes all mapping the same shared library and want to get rid of page X in that library, you need to index into the page table for each of those 50 processes even if only 10 of them have actually mapped the page. So Linux is trading off the simplicity of its design against performance. Many VM algorithms which are O(1) or (small N) under FreeBSD wind up being O(N), O(N^2), or worse under Linux. Since the pte's representing a particular page in an object tend to be at the same offset in all the page tables they are mapped in, reducing the number of accesses into the page tables at the same pte offset will often avoid blowing away the L1 cache line for that offset, which can lead to better performance. FreeBSD has added complexity (the pv_entry scheme) in order to increase performance (to limit page table accesses to only those pte's that need to be modified). But FreeBSD has a scaling problem that Linux does not in that there are a limited number of pv_entry structures and this causes problems when you have massive sharing of data. In this case you may run out of pv_entry structures even though there is plenty of free memory available. This can be fixed easily enough by bumping up the number of pv_entry structures in the kernel config, but we really need to find a better way to do it. In regards to the memory overhead of a page table verses the pv_entry scheme: Linux uses permanent page tables that are not throw away, but does not need a pv_entry for each potentially mapped pte. FreeBSD uses throw away page tables but adds in a pv_entry structure for each actually-mapped pte. I think memory utilization winds up being about the same, giving FreeBSD an algorithmic advantage with its ability to throw away page tables at will with very low overhead. Finally, in the page coloring section, it might help to have a little more description of what you mean here. I did not quite follow it. Do you know how an L1 hardware memory cache works? I will explain: Consider a machine with 16MB of main memory but only 128K of L1 cache. Generally the way this cache works is that each 128K block of main memory uses the same 128K of cache. If you access offset 0 in main memory and then offset offset 128K in main memory you can wind up throwing away the cached data you read from offset 0! Now, I am simplifying things greatly. What I just described is what is called a direct mapped hardware memory cache. Most modern caches are what are called 2-way-set-associative or 4-way-set-associative caches. The set-associatively allows you to access up to N different memory regions that overlap the same cache memory without destroying the previously cached data. But only N. So if I have a 4-way set associative cache I can access offset 0, offset 128K, 256K and offset 384K and still be able to access offset 0 again and have it come from the L1 cache. If I then access offset 512K, however, one of the four previously cached data objects will be thrown away by the cache. It is extremely important… extremely important for most of a processor's memory accesses to be able to come from the L1 cache, because the L1 cache operates at the processor frequency. The moment you have an L1 cache miss and have to go to the L2 cache or to main memory, the processor will stall and potentially sit twiddling its fingers for hundreds of instructions worth of time waiting for a read from main memory to complete. Main memory (the dynamic ram you stuff into a computer) is slow, when compared to the speed of a modern processor core. Ok, so now onto page coloring: All modern memory caches are what are known as physical caches. They cache physical memory addresses, not virtual memory addresses. This allows the cache to be left alone across a process context switch, which is very important. But in the Unix world you are dealing with virtual address spaces, not physical address spaces. Any program you write will see the virtual address space given to it. The actual physical pages underlying that virtual address space are not necessarily physically contiguous! In fact, you might have two pages that are side by side in a processes address space which wind up being at offset 0 and offset 128K in physical memory. A program normally assumes that two side-by-side pages will be optimally cached. That is, that you can access data objects in both pages without having them blow away each other's cache entry. But this is only true if the physical pages underlying the virtual address space are contiguous (insofar as the cache is concerned). This is what Page coloring does. Instead of assigning random physical pages to virtual addresses, which may result in non-optimal cache performance, Page coloring assigns reasonably-contiguous physical pages to virtual addresses. Thus programs can be written under the assumption that the characteristics of the underlying hardware cache are the same for their virtual address space as they would be if the program had been run directly in a physical address space. Note that I say reasonably contiguous rather than simply contiguous. From the point of view of a 128K direct mapped cache, the physical address 0 is the same as the physical address 128K. So two side-by-side pages in your virtual address space may wind up being offset 128K and offset 132K in physical memory, but could also easily be offset 128K and offset 4K in physical memory and still retain the same cache performance characteristics. So page-coloring does not have to assign truly contiguous pages of physical memory to contiguous pages of virtual memory, it just needs to make sure it assigns contiguous pages from the point of view of cache performance and operation.