Effective tools, gates, and accountability can help ensure your system’s success

 
 

Many people incorrectly judge Java/J2EE-based systems on problems associated with maintaining the codebase, the number of bugs or inconsistencies in functionality, or poor performance. Fortunately, these problems have less to do with the Java/J2EE technology itself and more to do with the lack of a process focused on the quality of the system. To ensure success of large-scale Java/J2EE projects developed by a sizeable team, or across multiple teams, a team lead must:

  • Use tools that can measure quality
  • Define a set of gates and artifacts derived from the tools
  • Stress accountability to deliver, monitor, and enforce the results

This article explains how incorporating these three tactics into your development strategy can ensure that your team consistently produces quality projects.

Importance of tools
Have you ever heard of a construction company attempting to build a house without a power saw, electric drill, or a tool as fundamental as a hammer? True, a house could be built without today’s new-fangled equipment, however, construction would take much longer, and the same level of quality would prove nearly impossible to achieve. You could build a hut with your bare hands, but you could build a mansion with the right tools.

Today’s developers are no different than a person attempting to build a house. The tools are essential to the developer, both for increasing productivity and for enhancing quality. The tools developers use must enable them to produce the highest quality code possible in the shortest amount of time, which means that today’s IDE is no longer simply a tool used to write, debug, and compile code. Instead, an IDE must help developers identify whether they are following proper coding conventions and known design patterns, if they are in compliance with industry standards such as Web services, if their code adheres to its contract, and if it performs per the requirements. In addition, when developers are not given the environments necessary to achieve continuous builds and automated testing, an IDE’s capabilities become even more important to ensuring the system’s quality.

Enter the Eclipse IDE, which provides built-in capabilities that, when used with several plug-ins, can aid in increasing the quality of both the codebase and the system. Eclipse is an open, extensible IDE built for anything and nothing in particular. Eclipse’s Java development environment is open source, free, and fully customizable. Eclipse both enables and promotes the addition of new capabilities via open source and commercially available custom-built plug-ins. By utilizing Eclipse, along with a key set of plug-ins illustrated in the Eclipse plug-in matrix shown in Figure 1, it is possible for a developer, and a team, to measure the quality of any J2EE- or Java-based system.

Figure 1. Eclipse plug-in matrix

Controlling the quality of a system is impossible if you cannot measure and monitor it. It is important to understand key areas in a system that warrant measurement. These areas include a system’s maintainability, reliability, and performance. While this list is obviously not all-inclusive, these three items are highly suited as the basic building blocks for ensuring the quality of a system.

Maintainability involves the complexity associated with understanding the code or modifying the code, whether it is a bug fix or an enhancement. Well-documented code that follows known coding standards and industry design standards is easier to maintain than code with sparse documentation that doesn’t follow any known standard development practices. Highly maintainable code allows changes to be introduced more quickly, thereby permitting the business to respond more rapidly to new requirements or change requests, and ultimately reducing the overall cost of both new features and ongoing maintenance.

Reliability indicates whether a method adheres to its contract and can be executed successfully. Unit tests are used to exercise a method’s contract, thus verifying the reliability of the code segment. The quality of the unit tests, in turn, is verified via code coverage analysis. Many approaches are available for measuring code coverage, including, but not limited to, statement, decision, condition, path, and call analysis. The type and amount of coverage necessary to provide absolute reliability of a method is a popular discussion topic. For this article’s purpose, simply note that reliability increases as the amount of code coverage increases.

Method reliability within a system is of the utmost importance as it represents, to some extent, a system’s stability. Other problems, such as performance or scalability issues, could arise, which may not be as readily found even with extensive unit testing and coverage analysis. Thus, unit testing and coverage analysis are by no means a be-all, end-all solution to ensuring system stability; however, the ability to reliably execute methods consistently represents a good measuring stick of the system’s reliability.

Performance is typically measured on a per-unit-of-time basis. A system’s ability to process numerous requests, to the amount of information sent over the wire, to the response time of a particular system call, are all performance criteria measured based on a unit of time. It is important to know, to some extent, how the system will perform. To ensure this understanding, one could measure all major service methods or potential problem areas with expected high usages, long call stacks, or those pieces that represent the most common paths through the core architecture. Each approach provides a varying level of comfort with regard to performance. For large-scale systems, performance should be continually maintained and monitored during development to identify snags early and avoid unforeseen problems in production environments.

Let’s now examine how the use of Eclipse and its plug-ins can help a development team, measure the maintainability, reliability, and performance of any Java- or J2EE-based system.

Code generation: Maintainability
Code generation is one of the best ways to ensure consistency and quality for repeatable code that differs based on type. XDoclet is currently the industry standard for generating Java source code. XDoclet is an open source, free library that parses the codebase, looking for custom Javadoc tags (metadata) that it then uses to generate other Java source files. XDoclet contains a set of Javadoc tags that may be used to generate most of the repeatable code found in the majority of Java/J2EE-based systems, such as JavaBeans and home and remote classes for Enterprise JavaBeans, even providing specific information for Borland Enterprise Server, JBoss, Orion, Resin, Sun Java System Application Server, WebLogic, and WebSphere. It also supports many other technologies such as Hibernate, JDO (Java Data Objects), and Castor. If that isn’t enough, XDoclet is extensible, allowing developers to create their own custom tags for generating homegrown code. By employing XDoclet and letting it generate code for you, you can reduce unnecessary coding errors and bugs found in repetitive code.

Code metrics: Maintainability
Due to the large size of codebases, visually monitoring the entire codebase is unachievable. Instead of walking through every line of code, metrics may be used to identify both existing and potential problems. The Metrics plug-in for Eclipse is an open source, free tool that can generate metrics on a per-class, per-package, or per-project level. The results may be exported to an XML file for historical purposes. It also contains an Ant task that can be used to generate the XML file at build time.

The Metrics Eclipse plug-in provides more than 23 types of metrics. Some of the most important metrics include number of interfaces, depth inheritance tree, number of overridden methods, McCabe’s cyclomatic complexity, afferent coupling, efferent coupling, and abstractness. By default, the Metrics view displays the compliant metrics in blue and the outliers in red. Anything in red represents a possible problem in the code that should be reviewed. The plug-in has its own view and runs in the background; thus, it can not only be used by a team lead at the end of an iteration, but also by developers as they code.

Code reviews: Maintainability
A code review is a useful exercise that helps ensure code quality, while also training and mentoring developers on coding style and coding best practices. Jupiter is an open source, free tool that allows team-based code reviews. Jupiter uses XML files to track individual team member reviews with a review ID and a reviewer ID (the review ID is shared for team-based reviews). These files are then checked into source control and made available to other developers, allowing for multiple reviews to occur simultaneously, without a server managing the reviews.

Each team member uses an XML file to check in and out of source control and see what needs fixing as well as what has been fixed by other team members. The use of Jupiter is a great way to tag required fixes without enduring the rigor that normally surrounds bug-tracking policies during new development. Jupiter frees team members to review code at their convenience, instead of forcing them to stop development at a time when they might be solving a problem to attend a code review meeting.

Standards adherence: Maintainability
Development standards exist to ensure past mistakes are not repeated. In addition, they ensure the code is consistent and more readable for the developer who must later maintain the code. Eclipse comes with a built-in code formatter that adheres to Java coding conventions. While this is a great feature, a code formatter is not enough to ensure the production of quality code.

Checkstyle builds upon Eclipse’s code-formatting capabilities, adding more than syntactical checks. It can identify noncompliant code blocks, coding problems, duplicate code, and some metrics violations. Even better, Checkstyle is extremely customizable, allowing the user to tailor the types of checks and their level of severity per the development standards within the organization. The default configuration file that comes with the Checkstyle plug-in is extremely comprehensive. Even so, I suggest developers invest their own time to customize this plug-in to match their organization’s development needs. Once customized, the Checkstyle configurations may be exported and used from project to project. The results of Checkstyle are displayed in the Problems view, which may be filtered and sorted. Results may be viewed at the folder, working-set, or resource (file) level, allowing developers to understand the quality of the overall codebase, subsystem, or individual class.

If your team is engaged in Web services development, then WSVT (Web Services Validation Tools) is a must-have plug-in. WSVT can determine if a Web service conforms to the guidelines and requirements defined in the WS-I (Web Services Interoperability) Basic Profile. Developers can right-click on a WSDL (Web Services Description Language) file, and it validates the WSDL and generates a report in a custom view that displays any violations. As an added bonus, the WSVT plug-in monitors TCP/IP traffic and observes, captures, and validates SOAP messages. The WSVT plug-in thus ensures compliance both at the interface and message level.

Functional testing: Reliability
Eclipse ships with both Ant and JUnit. Ant is the de facto industry standard for building Java-based applications. JUnit is a Java-based framework for creating unit tests.

Developers can set up individual JUnit tests to run in Eclipse, which provides a special view for the JUnit results, or use the Ant JUnit or JUnitReport task. The JUnitReport Ant task generates an HTML viewable report that may be used to represent the entire system’s tests, or specific tests, depending upon the customization. The HTML report is an excellent report to save for historical purposes as it can be used to gauge the project’s quality. By using JUnit for unit testing, developers can ensure their methods adhere to their contract, thus avoiding bugs that arise due to noncompliance.

Code coverage: Reliability
When developers write unit tests, they should understand how much codebase coverage their unit tests provide. GroboCodeCoverage is an open source coverage tool that integrates with Ant by providing an Ant task that can generate coverage reports. Individual reports may be generated, such as the Line Count Report and Function Count Report, which provide coverage percentages at the line and method level, respectively. However, the cornerstone of the tool is its Source Summary Coverage Report. This report mimics the Javadoc structure and provides a quality professional report that may be stored for historical purposes. As the project progresses, the report can be referenced to understand whether code coverage increases or decreases over time. By using GroboCodeCoverage, developers can ensure their most critical code pieces are fully exercised and pinpoint areas that lack coverage. Using this information, they may add or update existing JUnit tests, thus increasing system reliability.

Profiler: Performance
As projects near the end of the construction phase, developers tend to start thinking more about performance. Profiler for Eclipse is an open source, free profiler that provides many of the features that a developer needs for solving common performance issues. It shows the threads, the heap size, heap dump, method calls, method times, calls per package, and thread call-trees that allow a developer to see where time was spent through the flow of a call. Profiler for Eclipse helps developers understand where the application’s bottlenecks are occurring, allowing them to correct problems before releasing the application to the QA (quality assurance) department or production.

Next page >

[转]Using Lucene with EJB

十二月 24, 2004

Search is important! All too often search looks like where thing like ‘%that%’. Users know google, and quite a few even know its query language at this point. Aside from wanting to provide more functionality in search, users are expecting it. Google seems simple, doesn’t it?Enter Lucene. I’ll presume you’ve heard of it at least, if not used it. Lucene does full text indexing, and that is it. It does this really well. The beauty (well, one) is that you can index anything. In this case, I’ll index an object being persisted by OJB. The key is to embed information required to retrieve the document being indexed.

Take a gander at a fairly simple Student class (this is frmo an app I am doing for my little brother, who is a professor (of such terrible subjects as rock climbing and white water kayaking, don’t get me started)).

The primary use case for this application is for a student coop employee to be finding a student in the system, then finding gear and checking the gear out for the student. Finding the student is key, and that is best served by… searching! So we have a database record for each student, and want to have a convenient search facility, which can search based on name, student id (idNumber), phone number, even address. Lucene makes this is a snap. To do it, we just store the id (internal/pk id) in an unindexed field when we add a student in the StudentIndexer:

public void add(final Student student) throws ServiceException {
final Document doc = new Document();
doc.add(Field.Text(NAME, student.getName()));
doc.add(Field.Text(ID_NUMBER, student.getIdNumber()));
doc.add(Field.Text(ADDRESS, student.getAddress()));
doc.add(Field.Text(PHONE, student.getPhone()));
doc.add(Field.UnIndexed(IDENTITY, student.getId().toString()));
try {
synchronized (mutex) {
final IndexWriter writer = new IndexWriter(index, analyzer, false);
writer.addDocument(doc);
writer.optimize();
writer.close();
}
}
catch (IOException e) {
throw new ServiceException("Unable to index student", e);
}
}

Notice the UnIndexed field on the Document? This tells Lucene to store this field with the record, but don’t index it or search on it. When you retrieve the document you will get the field though. Perfect place to stash the primary key.

When we look for the students, we don’t want to get back Lucene Document instances, though, we want to go ahead and get the nice domain model instances of Student. What we’ll do is query against the index, pull all the pk’s for the hits out, then select for the domain objects using those pks (from the StudentIndex:

public List findStudents(final String search) throws ServiceException {
return this.findStudents(search, Integer.MAX_VALUE);
}

public List findStudents(final String search, final int numberOfResults) throws ServiceException {
final Query query;
try {
query = QueryParser.parse(search, StudentIndexer.NAME, analyzer);
}
catch (ParseException e) {
throw new ServiceException("Unable to make any sense of the query", e);
}
final ArrayList ids = new ArrayList();
try {
final IndexReader reader = IndexReader.open(index);
final IndexSearcher searcher = new IndexSearcher(reader);
final Hits hits = searcher.search(query);
for (int i = 0; i != hits.length() && i != numberOfResults; ++i) {
final Document doc = hits.doc(i);
ids.add(new Integer(doc.getField(StudentIndexer.IDENTITY).stringValue()));
}
searcher.close();
reader.close();
}
catch (IOException e) {
throw new ServiceException("Error while reading student data from index", e);
}
final List students = dao.findStudentsWithIdsIn(ids);
Collections.sort(students, new Comparator() {
public int compare(final Object o1, final Object o2) {
final Integer id_1 = ((Student) o1).getId();
final Integer id_2 = ((Student) o1).getId();
for (int i = 0; i != ids.size(); i++) {
final Integer integer = (Integer) ids.get(i);
if (integer.equals(id_1)) {
return -1;
}
if (integer.equals(id_2)) {
return 1;
}
}
return 0;
}
});
return students;
}

The findStudents(string, string, int): List method is a little bit more complex than I like as it does a few things: query against the lucene index, extract the primary keys for the hits, query for the students matching those pk’s (via the StudentDAO), and finally sorts the results (no way to specify the sort order in the query, it is dependent on the order of the hits from the lucene query). With that though, we support queries such as Tiffany, which is simple, or a more fun one, name: Aching phone: ???-1234 or what not. Go look at the Lucene query parser syntax. It is worth noting that the above query defaults to searching on the name field if no specific field is specified. This seems to make sense to me =)

If you look at the StudentIndex and StudentIndexer you will see there are also facilities for adding and removing documents from the lucene index. This gets important on any insert/update/delete operation. The update is important to catch as you need to remove the old entry and insert a new one in the index. Doing this is best done (my opinion) via an aspect which picks these operations out. That is outside the scope of this article though ;-)

For a larger application with more things being indexed (this just has two searchable domain types) I might generalize the search capability via a DocumentFactory such as:

public class BeanDocumentFactory implements DocumentFactory {
public Document build(Object entity) {
final Document document = new Document();
try {
final BeanInfo info = Introspector.getBeanInfo(entity.getClass());
final PropertyDescriptor[] props = info.getPropertyDescriptors();
for (int i = 0; i != props.length; ++i) {
final PropertyDescriptor prop = props[i];
final String name = prop.getName();
final Method reader = prop.getReadMethod();
final Object value = reader.invoke(entity, new Object[]{});
final Field field = Field.Text(name, String.valueOf(value));
document.add(field);
}
}
catch (Exception e) {
throw new RuntimeException("Handle these in real application", e);
}
return document;
}
}

But I have not needed to generalize it for a real project yet =)

Speaking of Lucene (which rocks) I am eagerly anticipating Erik Hatcher’s new book, Lucene in Action. If it is anything like Erik and and Steve Loughran’s Java Development with Ant Lucene will be a lucky project to have it in circulation.

About the author

Brian McCallister
Blog:
http://kasparov.skife.org/blog/

[转]IDE Enhancement Tools

十二月 18, 2004
IDE Enhancement tools extend and improve Integrated Development Environments (IDE) such as Visual Studio .NET and Delphi, usually via their extensibility architectures. These products aim to improve developer productivity by eliminating repetative tasks and simplifying operations through automation.
IDE Enhancement Tools by Developer Express
CodeRush for Delphi V7 – The Fastest, Easiest, and Most Powerful Way to Program in Delphi. CodeRush is the ultimate productivity toolset that binds seamlessly into the Delphi IDE. It has been designed to enhance Delphi’s built in editor and to provide features and capabilities necessary to program at the speed of thought. Though Delphi provides a basic editor (which most people use because it understands dpr, dfm and pas file relationships), it does not provide advanced features available with CodeRush. more…

Website: www.devexpress.com

Add-in Express (ADX) V2.2 – Add-in Express (ADX) is a tool for creating COM add-ins for Microsoft Office Family applications. With Add-in Express you can extend standard menus and toolbars of the host application (command bars), add your own toolbars (command bars), option pages and property pages, and add new or enhance the built-in functionality of the host application more…

Website: www.afalinasoft.com

Flywheel Professional 7.2 – Flywheel Professional 7.2 is an agile, code centric tool for designing, visualizing and refactoring Microsoft® Visual Studio® .NET 2003 solutions. Live linked synchronization allows the Flywheel and Visual Studio .NET tools to be used in any order, at any time in the development process. Version 7.2 features include fast incremental loading of large solutions, tightly integrated design tools, and complete cross language refactoring support with an integrated solution wide reference analyzer. more…

Website: http://www.velocitis.com

TMS Plugin Framework V6.0 – Makes building applications with plugins easy. more…

Website: www.tmssoftware.com

Eiffel ENViSioN! V2.01 – Eiffel ENViSioN!TM is a plug-in for Visual Studio .NET – its icon appears in the same place the other languages do, but that’s where the similarity ends. Eiffel ENViSioN! enables you to use the powerful features of the Eiffel language, including Design by ContractTM (a means for producing bugfree software, native only to Eiffel), multiple inheritance, generics, and many others. You can use it to be more productive than you ever dreamed. more…

Website: www.eiffel.com

TurboVB V3.2.5 – TurboVB V3.1 is an integrated feature packed add-in tool for Microsoft® Visual Basic™ 6; a major enhancement to the VB IDE. It provides forty add-ins plus many other functions that provide great productivity gains. It helps to remove some of the repetitive elements in VB programming, this allows you to focus on your more important tasks. The tool is designed to be highly integrated and configurable, to suit your particular requirements. more…

Website: www.turbodeveloper.com

Visual Assist X 10 – Visual Assist X boosts productivity with powerful editing features that are completely integrated into your Microsoft development environment. Visual Assist X increases automation, simplifies navigation and displays vital information as you develop. Features include upgraded Intellisense, enhanced syntax coloring, and suggestions as you type. Install in two minutes. more…

Website: www.wholetomato.com

ActiveOptimizer Professional Bundle V2.0 – 4 Must Have Tools in One Professional Package: pdProfiler2 – Find and solve your bottlenecks in just minutes. pdGuideBook2- complete database of optimization knowledge. pdAddin2 – VB IDE Addin productivity tools improves your development and pdSpeed DLL – Utility/Performance DLL complete with source code. more…

Website: www.platformdev.com

IntelliJ IDEA V3.0 – JetBrains IntelliJ IDEA is the award-winning Java IDE that has been capturing the minds and hearts of developers worldwide. Intelligence & usability – Java, JSP, XML and HTML expertise – code completion, formatting, live templates, code inspection – any feature is available with a single key stroke. refactoring automation – more than 30 important refactoring tools to help you not only change application structure, but to assist in everyday coding and more… more…

Website: www.intellij.com

CodeSMART VB6 2005 – CodeSMART is an add-in for Microsoft Visual Basic 5.0 and 6.0 that will bring you power, productivity and refinement. With CodeSMART the Visual Basic IDE becomes the efficient, powerful and high-quality programming environment of your dreams. more…

Website: www.axtools.com

VBCodeHelper V5.4.5 – VBCodeHelper is a multi-function Add-In program for Microsoft Visual Basic 6.0. VBCodeHelper adds its own menu, and 15 buttons and a combo box to its own toolbar in the Visual Basic IDE. It is also supplied with a configuration program that allows you to configure the add-in to your exact requirements. All the code insertion options are fully template driven and so can be customised to your exact requirements. more…

Website: www.frez.co.uk

VB Advantage V6.0 – VB Advantage is a powerful, critically acclaimed, VB development productivity utility that enhances VB’s design time environment. VB Advantage has many powerful, helpful, and easy-to-use features and tools that help you produce better code while saving time and eliminating those mind-numbing redundant tasks. VB Advantage was conceived to support software engineering development activities that developers do as they create, test, and maintain application code. more…

Website: www.advantageware.com

Mere Mortals .NET – The Mere Mortals .NET Framework continues our tradition of developer tools that make following best practices easy. It is a written in C# and designed be used in your .NET application development efforts with C#, VB .NET, or C++. Two years of tapping .NET’s strengths and working around its pitfalls went into MM .NET Framework. Below you will find many of the reasons why you should take a serious look at Mere Mortals .NET. more…

Website: www.oakleafsd.com

CodeObject V3.1 – CodeObject significantly improves C# development performance by automating and standardizing mundane programming tasks, reducing errors and input time, and improving navigation. CodeObject is a Visual Studio .NET add-in that increases developer productivity for C# projects. more…

Website: www.codeobject.com

CodePro Express – CodePro Express is a set of professional software development tools that extend the Java development environment contained in IBM WebSphere Application Server Express and the Eclipse platform. CodePro Express offers software developers the most complete, economical and efficient way to create applications for deployment on WebSphere Express. CodePro Express gives developers a rich set of tools that are packed with valuable features. more…

Website: www.Instantiations.com

Thread Factory V4.0 – Thread Factory is a component library for creating robust Multi-Threaded Visual Basic 6 Applications. Thread Factory includes a wealth of features including an asynchronous calling convention that parallels COM+ with Begin_xxx and Finish_xxx methods and asynchronous error handling. Thread Factory does not create or use ActiveX EXE, it creates true multi-threaded VB6 applications. more…

Website: www.halfx.com

Project Browser+ V4.3 – Project Browser+ v4.3 is a new and improved project explorer for VB6! Integrating every usable portion of the VB6 extensibility interface, all pieces of the IDE and your projects are gathered in one place with customizable views. Locate projects, components, members, methods, properties and more with a click. Group project tasks, source encryption, printing and more to enhance your VB6 IDE dramatically. more…

Website: www.dev4dev.com

[转]IBM CEO致员工的关于联想收购IBM个人计算机部门的信
2004-12-17 12:59:04 Dear IBMer:
  
  I have important news to share with you.
  
  Today we announced a definitive agreement with the Lenovo Group, China’s largest manufacturer and distributor of personal computers. Lenovo will acquire IBM’s Personal Computing Division, creating the third-largest PC business in the world. Headquartered in New York, Lenovo’s new PC business will be our preferred supplier of IBM- and Think-branded PCs. This will allow us to continue to provide our clients with end-to-end integrated solutions — but with the advantages of a PC business with unique and powerful capabilities: significant economies of scale, global distribution channels, strong brand recognition, an experienced and expert management team and workforce, and number one market position in the world’s fastest growing IT market. In addition, IBM will be Lenovo’s preferred services and financing provider. IBM will hold an 18.9 percent equity stake in Lenovo.
  
  Today’s announcement is the latest action we have taken in recent years to reposition IBM for leadership in a rapidly changing industry. For some time we have said that there are two ways to create long-term value for clients and shareholders in the IT industry: Invest heavily in R&D and be the high-value innovation provider for enterprises, or differentiate by leveraging vast economies of scale, high volumes and price.
  
  IBM is an innovation company. We are committed to being the premier IT solutions partner for enterprises of all sizes, in all industries. This business model requires that we continuously create intellectual capital and that we reinvent everything we do — our technologies, products and services, our culture and our portfolio of businesses. This has been the hallmark of our company and has enabled IBM to grow and to lead countless product and technology cycles over many decades.
  
  Today, computing and its uses are again changing radically — to what we’ve been describing as on demand business. This is opening up tremendous opportunities for IBM, and it’s why we have invested billions of dollars in recent years to strengthen our capabilities in hardware, software, services and core technologies focused on transforming the enterprise. At the same time, the PC business is rapidly taking on characteristics of the home and consumer electronics industry, which favors economies of scale, pricing power and a focus on individual users and buyers. These are very different business and economic models, and they will diverge even further in the years ahead.
  
  By combining our personal computing division with its own, highly complementary business, Lenovo will be much better positioned to capture the opportunities in the PC industry. Lenovo is committed to investing in, growing and winning in PCs. Lenovo will be a formidable competitor, and our alliance gives IBM an even stronger position in China, while strengthening our brand presence there.
  
  Of course, IBM will continue to play a significant role in creating innovations for individuals — and not only through the broad PC alliance we are announcing today. As you know, our company is investing heavily to create the computing platform of the future. Our microprocessors and open software technologies underpin the next-generation on demand infrastructure, which will extend from the enterprise to the home to the mobile product to an expanding array of connected devices. In just the past couple of weeks, we’ve made significant announcements about advanced microprocessors we are developing in partnership with some of the world’s leading consumer electronics companies, as well as initiatives to broaden adoption and support for these open platforms.
  
  We are excited by today’s announcement and how it further positions IBM for leadership in the days and years ahead.
  
  
  Sam Palmisano
  Chairman and Chief Executive Officer

XSL Processors

十二月 5, 2004

DocBook XSL:The Complete Guide

Chapter 2. XSL processors

An XSL processor is the software that converts an XML file into formatted output. There is a growing list of XSL processors to choose from. Each tool implements parts or all of the XSL standard, which actually has several components:

The XSL Standards

Extensible Stylesheet Language (XSL)
A language for expressing stylesheets written in XML. It includes the XSL formatting objects (XSL-FO) language, but refers to separate documents for the transformation language and the path language.

XSL Transformation (XSLT)
The part of XSL for transforming XML documents into other XML documents, HTML, or text. It can be used to rearrange the content and generate new content.

XML Path Language (XPath)
A language for addressing parts of an XML document. It is used to find the parts of your document to apply different styles to. All XSL processors use this component.

To publish HTML from your XML documents, you just need an XSLT processor. It will include the XPath language since that is used extensively in XSLT. To get to print, you need an XSLT processor to produce an intermediate formatting objects (FO) file, and then you need an XSL-FO processor to produce PostScript or PDF output from the FO file. A diagram of the DocBook Publishing Model is available if you want to see how all the components flow together.

XSLT processors

Currently there are three processors that are widely used for XSLT processing because they most closely conform to the XSLT specification:

Saxon
Saxon was written by Michael Kay, the author of XSLT Reference, one of the best books on XSLT. Saxon is a free processor written in Java, so it can be run on any operating system with a modern Java interpreter. It uses the Aelfred XML parser internally, which has some bugs, so many people substitute the Xerces parser.

Xalan
Xalan is part of the Apache XML Project. It has versions written in both Java and C++, both of them free. The Java version is described in this book because it is highly portable and more fully developed. Generally Xalan is used with the Xerces XML parser (Java or C++), also available from the Apache XML Project.

xsltproc
The xsltproc processor is written in C by Daniel Veillard. It is free, as part of the open source libxml2 library from the Gnome development project. It is considered the fastest of the processors, and is highly conformant to the specification. It is much faster than either of the Java processors. It also processes basic XIncludes.

There are a few other XSLT processors that should also be mentioned:

XT
James Clark’s XT was the first useful XSLT engine, and it is still in wide use. It is written in Java, so it runs on many platforms, and it is free. XT comes with James Clark’s nonvalidating parser XP, but you can substitute a different Java parser.

MSXML
Microsoft’s MSXML engine includes an XSLT processor. It is reported to be fast, but only runs on Windows.

Sablotron
Sablotron, written in C++, from Ginger Alliance.

4XSLT
4XSLT, written in Python, from FourThought LLC.

[转]CMS Tutorial(1)

十二月 3, 2004

[转]CMS Tutorial(1)
2004-12-3 16:36:42

CMS Tutorial

What is a Content Management System (CMS)?

A Content Management System (CMS) is a combination of large database, File System, and other related software modules which are used to store and later retrieve huge amounts of data. These systems are different from the databases in the sense that these can index text, audio clips, video clips, or images in a database. Users of the content management system can find relevant content from within a database by searching for keywords, authors, date of creation, etc.. Content Management Systems can also be used to create information portals which serve as the backbone of data management. Along with the database handling facilities, the software modules also allows anyone to contribute information to a website via a graphical user interface (GUI). They are usually based on a pre-written template that acts as a platform for each page in the site as those pages are created.

At the company level, Content management systems (CMS) store and manage an organizationâ_#8482;s electronic document and Web content so that the employee of the company can reuse the information across different applications. The web content can also be distributed to customers and business partners outside the organization. The core application of the CMS is to manages content during its entire lifecycle i.e. from creation through publishing. The content of the CMS can also be shared by e-commerce and customer relationship management systems (CRM). Web Content management system enables you to establish a consistent look and feel throughout your site, but gives your non-technical content authors the power to publish and update their own content using simple, but powerful, browser-based tools. Some of the CMS systems integrate with content delivery applications to deliver the content via a web site.

There are three basic participants in the Content management system:

・ Content Editors (Decide what content to publish and where)

・ Content Publishers (Publish the content on the web)

・ Content Authors (Create the content for the web)

A CMS allows non-technical authors and editors to easily and quickly publish their content which is otherwise done by technical programmers. A CMS establishes defined publishing processes and specific publishing rights to various individuals. By using these facilities, the company can save the time for training, while facilitating more people to publish. It also reduces the daily stream of calls to the IT department for changes to the website. A CMS reduces time-to-publish, allowing you to get content published faster. This is an important issue for the modern organization. The quicker you get key content published, the more value and emphasis it creates. A wide range of content can be published using the CMS.

This can be characterized as:

・ Simple pages for normal presentation

・ Complex pages, with specific layout and presentation

・ Dynamic information sourced from databases and will change on regular time intervals

・ Training material

・ Online manual

・ General business documents

・ Thousands of pages in total for different categories of customers

・ Extensive linking between pages

Who needs a Content Management System (CMS)?

In today’s world of e-Businesses, content flow is almost as crucial as cash flow. If an enterprise cannot refresh the information about its product on a continuous basis then it will not be able to fulfill the today’s Internet based expectations. If any company wants to increase the content flow without spending lots of money and with less problems then the content management systems are chosen as a way to automate the content gathering and delivery process. Any company or organization will need the CMS if it meets minimum 4 requirements out of the many listed below:

・ A big organization where web publishing is spread over many places, and to communicate the content between different branches is very time consuming

・ The web site of the company is big and there are frequent updates of content or structure

・ The online operation perform personalization

・ Very frequent content integration between the web site and retail outlets, call centers, email newsletters or other channels

・ Strong requirement to manage specifications from R&D to customer support

・ Company has customers which also contribute to the site

・ One individual has intimate knowledge of the entire site (and others have intimate knowledge over their own sections) so if there is a requirement for changes then it is not possible to change it without the help of a specific individual.

If the organization/company finds that it needs the Content management system then before selecting one, organizations must also evaluate the cost of not having a system in place. These questions asked should be:

・ What are the costs associated with your content being unavailable either through the Web site, or from your primary content storage systems?

・ What is the risk of having inaccurate content on your Web site?

・ How much does the insurance for that risk cost?

・ How do you recover and replace inaccurate content when your Webmaster is unavailable?

Features, Benefits and Advantages of a Content Management System (CMS)

In any Content management system, there are many basic features which should be present so that the system works efficiently and saves money. These are

・ The CMS database or the central repository for corporate content must be accessible to a wide range of technical and non-technical individuals. Its interface must be easy to use, and its architecture must fit within the framework defined by the IT organization. It should be menu driven so that the pages can be added and linked easily.

・ Creating, designing and deploying the web content should be automated according to the need of the organization.

・ It should reduce the time programmers have to spend on building custom forms for content management. The programmer can spend that time on the front-end of the web site.

・ The user interface design of CMS should be changed using templates. Different templates for different level of users should be present.

・ The CMS Management Tool should be present which is used to manage groups, users and rights from a central point. There should be a facility to import groups and users in the domains like Windows or Unix.

・ It should be powered by the latest database and internet technologies and should be used in any operating system on any computer platform.

・ CMS should facilitate better content security. It should control who is allowed to publish to the website, and who is allowed to see what content.

・ It should eliminate the constant and large volume of updates by redistributing the publishing work among the content authors, who can now publish and update their own content using easy-to-use, browser-based tools.

・ It should reduce Web site maintenance time and costs. Most of the maintenance operations should be automatic.

・ For the content authors, the facilities like, selecting different types of content from the inbuilt content library, cut & paste from other applications, set publication dates and times for finished pages in advance and have them publish automatically and easy, automatic page indexing and linking should be present.

・ Content management system should provide tools for adding and managing content for administrators including Content Owners, Content Editors, Page Owners and Site Administrator.

・ The system should be able to scale in terms of performance, integration with other applications and the addition of custom features.

・ The system should provide multi user options for controlling the user privileges at multiple levels. The security keys should be provided to restrict the users to work according to their access rights.

・ The database model should not be rigid but it should be able to analyze the database structure and build the forms for database tables accordingly.

What’s a methodology?

十二月 1, 2004

看论文或者书籍的时候经常看到methodology这个词,只知道翻译成方法论,到底什么算是一种方法论确一直没有搞清楚,前段时间看paper的时候看到了Arnon Sturm的一篇文章Evaluation of Agent-Oriented Methodology,里面给出了我想要的答案:一种方法论包括了以下一系列的指导原则和活动:

we refer to a methodology as the entire set of guidelines and activities:

  • a full lifecycle process;
  • a comprehensive set of concepts and models;
  • a full set of techniques (rules,guidelines, heuristics);
  • a fully delineated set of deliverables;
  • a modeling language; a set of metrics;
  • quality assurance;
  • coding (and other) standards;
  • reuse advice;
  • guidelines for project management.

These are each associated with one of four major divisions: concepts and properties, notations and modeling techniques, process, and pragmatics.

一种方法论覆盖了开发生命周期中多个阶段:

Each methodology may have elements that are useful to several stages of the development life cycle. In this paper, the lifecycle stages are defined as follows:

• Requirements’ gathering is the stage of the lifecycle in which the specification(usually in free text) of the necessities from the system, is done.

• Analysis is the stage of the lifecycle that describes the outwardly observable characteristics of the system, e.g., functionality, performance, and capacity.

• Design is the stage of the lifecycle that defines the way in which the system will accomplish its requirements. The models defined in the analysis stage are either refined, or transformed, into design models that depict the logical and the physical nature of the software product.

• Implementation is the stage of the lifecycle that converts the developed design models into software executable within the system environment. This either involves the hand coding of program units, the automated generation of such code, or the assembly of already built and tested reusable code components from an in-house reusability library.

• Testing focuses on ensuring that each deliverable from each stage conforms to, and addresses, the stated user requirements.

Semantic Web Related Sites

十一月 4, 2004

1. Non Commerial Uses

http://www.mindswap.org/
the first site on the Semantic Web

http://www.openrdf.org/
Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema.

http://protege.stanford.edu/
Protégé is an ontology editor and a knowledge-base editor.

http://jena.sourceforge.net/
Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, including a rule-based inference engine.

http://simile.mit.edu/
Semantic Interoperability of Metadata and Information in unLike Environments

http://www.dspace.org/
DSpace is a groundbreaking digital library system that captures, stores, indexes, preserves and redistributes the intellectual output of a university’s research faculty in digital formats.

http://kowari.sourceforge.net/
The Kowari MetastoreTM is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage and retrieval of metadata.

http://4suite.org/index.xhtml
4Suite is a platform for XML processing and knowledge-management. It allows users to take advantage of standard XML technologies rapidly and to develop and integrate Web-based applications.

http://kaon.semanticweb.org/
KAON is an open-source ontology management infrastructure targeted for business applications. It includes a comprehensive tool suite allowing easy ontology creation and management, as well as building ontology-based applications.

http://librdf.org/
Redland is a set of free software packages that provide support for the Resource Description Framework (RDF).

http://www.ninebynine.org/RDFNotes/Swish/Intro.html
Swish is a framework, written in the purely functional programming language Haskell, for performing deductions in RDF data using a variety of techniques. Swish is conceived as a toolkit for experimenting with RDF inference, and for implementing stand-alone RDF file processors (usable in similar style to CWM, but with a view to being extensible in declarative style through added Haskell function and data value declarations). It explores Haskell as “a scripting language for the Semantic Web”.

http://www.w3.org/2000/10/swap/doc/cwm.html
Cwm is a general-purpose data processor for the semantic web, somewhat like sed, awk, etc. for text files or XSLT for XML. It is a forward chaining reasoner which can be used for querying, checking, transforming and filtering information. Its core language is RDF, extended to include rules, and it uses RDF/XML or RDF/N3 (see Notation3 Primer) serializations as required.

http://www.ontotext.com/kim/

KIM is a software platform for:

  • Semantic annotation of text.
    At more length: automatic ontology population and open-domain dynamic semantic annotation of unstructured and semi-structured content for Semantic Web and KM applications.
  • Indexing and retrieval (an IE-enhanced search technology).
  • Query and exploration of formal knowledge.

2. Commerial Uses

http://www.tucanatech.com/

With Tucana Information Management Suite at the core of your Enterprise Information Integration (EII) strategy you bring all the power of enterprise knowledge together and put it in the hands of your engineers, scientists, bankers, salespeople or managers.

http://www.siderean.com/
Siderean’s flagship product, Seamark Server, is a faceted navigation platform that delivers an effective and economical standards-based solution that dramatically improves information access across distributed repositories of content, data, software components and digital assets in the enterprise.

http://aduna.biz/index.html

AutoFocus helps you to search and find information on your PC, network disks, mail boxes, websites and enterprise information sources.

« 上一页