|
| Fri, May 16th | home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop | 22:03 PDT |
|
login « register « recover password « |
| [Article] | add comment | [Article] |
| Theme topics | Apps | Resources | Window Managers | Afterstep | Blackbox | Enlightenment | Fluxbox | GTK | IceWM | KDE | MetaCity | Sawfish | Window Maker |
Dave Gudeman writes: "A developer who wants to make a piece of software available to others faces the daunting task of software delivery. There are several strategies for delivering software, primarily source code, machine binaries, and virtual machine binaries, each with its own advantages and disadvantages. I'm going to discuss each of the alternatives, then suggest a variation that is potentially better than any of the other solutions for commercial as well as Open Source software projects." Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly. The simplest solution for the user of the software is to deliver machine binaries with a system-dependent installation script so the user does not have to do anything but run the script. This method is expensive for the distributor, who has to test, maintain, and deliver multiple distributions and installation scripts. And, with this method, it is inevitable that some systems will not be supported. The disadvantages of this method may be summarized by saying that machine binaries are too dependent on the user's hardware and OS platform. From the distributor's point of view, the easiest delivery method is bare source code, since it requires no work other than making the code available. However, this does not make the problems of distribution go away; it just moves them to the user. In order to compile the program, the user needs to have a development system compatible with the developer's, including a compiler, translators, libraries, and tools such as make and yacc. And even with the proper tools, if the user's hardware or OS is different from the developer's, the user may need to do various porting work. The disadvantages of this method may be summarized by saying that source code is too dependent on the developer's hardware, OS, and development platform. So binary distributions are too dependent on the target system and source code distributions are too dependent on the development system. These platform dependencies can be largely eliminated by delivering virtual machine (VM) binaries. This method has been popularized by Java and its class files, but it has been successfully used in other systems for decades. VM binaries are independent of the target system in the sense that virtually any computer can have a VM interpreter. Typically, VM implementations are not independent of the development system since they use only a single programming language, but there is no particular reason why this should be the case. In fact, although the Java VM was intended to execute a single language (Java), many other languages can now be translated into JVM class files (see http://grunge.cs.tu-berlin.de/~tolk/vmlanguages.html). Programs written in these languages can be implemented on any machine that has a Java interpreter, making them relatively independent of both the target and development systems. At first glance, it seems that Java class files might solve all of our software distribution problems. We can translate all programming languages into JVM class files and distribute all programs in that form. But the problem with interpreted software in general, and class files in particular, is that it is much slower and requires more memory than compiled software. JIT compilers can partially address the speed problem by generating machine code on the fly, but they cannot do serious optimization since that would take too long. This will prevent Java-style JIT implementations from ever really competing with native code solutions in performance. Of course, there is no reason in principle why we could not deliver machine-independent class files and have the installation run an optimizing compiler on the class file to produce native code. This would generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code. The reason is that much of the semantic information you need for effective optimization is lost in the translation to class file format. Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications. So far, I have concluded that the most machine-independent form of software distribution is VM binaries, that it is necessary to optimize VM binaries, and that optimizing VM binaries on the user's machine is ineffective and inconvenient. The obvious alternative is to deliver VM binaries that have already been optimized. There is a problem with this as well: different target platforms require different optimization. But there are some optimizations that can be done at a higher level. For example, it is always a win to evaluate complex expressions at translation time instead of at run time, and it is always a win to remove dead or unreachable code. What is needed is a VM that allows more extensive system-independent optimization in the VM code. This sort of preoptimized code could be loaded by a program like a JIT compiler and executed at optimized native code speeds. The challenge is to find a way of optimizing VM files without relying on machine-dependent optimization. To see how we could approach this, consider the process of translating a source language to a VM language and then to machine language for native execution.
Source ------> VM ------> Machine The source language is very different from the machine language, and the VM language is somewhere in between. The question is, where is it? Is it closer to the source language, closer to the machine language, or somewhere in the middle? At one end of the scale, the VM for a traditional compiler is the machine itself, and at the other end are common scripting languages where the VM is the source language. Java class files are near the middle. The closer a VM is to the source language the easier it is to do the source-to-VM translation. The closer the VM is to the machine language, the easier it is to do the VM-to-machine translation. For fast JIT compilation, we want the fastest possible VM-to-machine translation, so this suggests moving the VM toward the machine and away from the source. But going against this is the fact that we want to remain machine-independent, and that moves us away from the machine. Still, there are commonalities between machines. For example, most modern machines are register-based, and this suggests that the VM should be register-based so we can do preliminary register allocation in the VM. We can view the process of translation more generally as one that involves many source languages and many machines, and the challenge is to find the point for the VM that allows for the most optimization and the simplest VM-to-machine translation for the largest set of machines. Let's call such a VM an Optimizable Portable Instruction Set or OPIS. I have studied these issues a little as part of implementing an optimizing compiler, and I am confident that a reasonably good OPIS can be designed. However, this a research project that would require expertise in implementing many different programming languages and in writing compilers for many different machines. Is this something the Open Source community is capable of? Can the community model be applied to a large research project or is research too different from development for the model to carry over? In a sense, the community model is very similar to the academic research model; the work is distributed over many researchers each doing what he or she is most interested in, and the result is often rapid progress in the state of knowledge. I would like to hear from anyone who might be interested in participating in such a research project and anyone who knows of related work in the area.
Dave Gudeman (dgudeman@azstarnet.com) received his PhD in computer science from the University of Arizona in 1994. His research areas involved programming language design and implementation and his dissertation involved the design of an optimizing compiler for a concurrent constraint programming language. He is currently working for a small software company designing databases and GIS applications. His contributions to free software include the Janus Compiler (for research purposes only) and a Java XML reader/writer called Harp (the source code for Harp will be posted on SourceForge Real Soon Now).
T-Shirts and Fame!We're eager to find people interested in writing editorials on software-related topics. We're flexible on length, style, and topic, so long as you know what you're talking about and back up your opinions with facts. Anyone who writes an editorial gets a freshmeat t-shirt from ThinkGeek in addition to 15 minutes of fame. If you think you'd like to try your hand at it, let jeff.covey@freshmeat.net know what you'd like to write about.[Comments are disabled]
[»]
Re: Reprise from the author I thought I'd have a go at clarifying my reason for my original response. I have no interest in a flame war and am happy to agree to disagree, but I at least want to point out a few things so that in the future, we can all have a more productive discussion about the topic you presented. Todd Fast wishes to berate me for not mentioning runtime code specialization. My intent was not to berate, and I did not intend what I wrote to be derisive. However, I did write my response to attempt to qualify your otherwise unqualified statements regarding JIT compilation, namely these largely unsupported statements: "But the problem with interpreted software in general, and class files in particular, is that it is much slower and requires more memory than compiled software. JIT compilers can partially address the speed problem by generating machine code on the fly, but they cannot do serious optimization since that would take too long. This will prevent Java-style JIT implementations from ever really competing with native code solutions in performance."Furthermore, you use the above statement as a conclusion to bolster other conclusions: "So far, I have concluded that the most machine-independent form of software distribution is VM binaries, that it is necessary to optimize VM binaries, and that optimizing VM binaries on the user's machine is ineffective and inconvenient." While I will agree with you that this was not the main thrust of your editorial, you did say it, and if you include it as a premise to your conclusion, it's fair game for criticism. You also had this to say: "Of course, there is no reason in principle why we could not deliver machine-independent class files and have the installation run an optimizing compiler on the class file to produce native code. This would generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code. The reason is that much of the semantic information you need for effective optimization is lost in the translation to class file format. Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications." I call attention to these exceprts: "...generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code..." "Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications." More below. In the first place I don't think any production JIT compilers do this There are at least two. The first is Javasoft's Hotspot product, which is the default VM technology included with JDK1.3. It is also available for JDK1.2 as an add-on. Hotspot has been in production for something 1-1.5 years I believe. The other is Sun's Exact VM, which is something of a competitor to Hotspot. I don't know its current release status, but I do know that it has been used successfully in commercial production environments, with significant benefit. in the second place this technique benefits only a small minority of programs What you mean by "small minority" is unclear, but it is indisputable that the vast majority--if not all--modern Java programs benefit from this technique with the inclusion of it in every shipped Java VM. Anecdotally, in every program that is CPU bound, I have seen Hotspot only improve performance, in many cases by an order of magnitude. and in the third place, static optimization is still critical in systems that do implement this technique, all of which leave my original position unaffected by your response. I agree, as does everyone else in the field, that static optimization is still necessary and advantageous. However, given your premises that: "...an optimizing compiler...[would] generally produce better code than a JIT compiler could, but it would still not be comparable to what a high-quality optimizing compiler could do with the original source code..."and "Another problem with this solution is that optimization can take a very long time, so running an optimizing compiler is impractical for applets and inconvenient for the installing traditional applications."I think it's fair to bring to light facts relevant to conclusions supported by them. Given the information to which I've been exposed and the current state of the art, it seems that the above premises are at least flawed, if not outright incorrect. It is clear--at least to me--how this new information influences your position. "And may I suggest, Mr. Fast, that you are inviting flames by taking a quote out of context, introducing a completely new subject, and then urging the author to "learn more about such technologies before dismissing their advantages out of hand". I could hardly have dismissed these technologies when I never mentioned them." Although it may not have been your original intent to discuss these factors as relevant to your conclusion, I believe I've outlined above just how integral your assumptions about JIT technology are to your overall position. I don't disagree with your goal of outlining possibilities for VM code as a software distribution mechanism. I simply want to see it discussed with all the relevant information taken into account. Finally, let me say that despte disagreeing violently with nearly every other conclusion you present in your article, "Java: a dissenting opinion", I agree and very much like your analysis of the true shortcomings behind the use of pointers--it is perhaps the first time I've seen this subtlety outlined.
[»]
Why not sources? I think VM code will result in the same problems we have with sources:
What's the win? What does Open Source gain? Distributing half-binary versions of the program? That's not Open Source! BTW For low level actions you'll get better performance by using the advantages of a specific OS, so you probably want to use the native interfaces anyway. Conclusion: It doesn't solve it! It just takes the problem to the binary realm (try to fix a bug there!). Using portable libraries like GLib/OpenGL/OpenAL and staying close to POSIX is the best way to go IMHO.
[»]
Reprise from the author A few comments to the comments:
[»]
Slim binaries. Have a look at http://www.ics.uci.edu/~franz/ I recommend reading the document about 'slim binaries'. It uses a very different approach: a tree representation of the code instead of virtual machine instructions. This way it is much more efficient for optimizing (and also for compressing the binary!).
[»]
Learn more about JIT technologies before dismissing them The article mentions several times that JIT compilers, such as those used
in the Java VM, are only partially effective at optimizing VM code, and
not nearly as effective as a static compiler. The conclusion seems to be
this:
[»]
The big problem with a VM The big progblem with a VM is that one day, just the machine code won't cut
it. You'll need some kind of API, and then you'll run into exactly the same
problem as Java does today, and the main reason configure scripts sometimes
fail (as you note in the article). No matter how standardized this API is,
you'll need upgrades some time in the future, and then backwards
compatibility is lost, and you really gained nothing. Look at what has
happened to Java (lots of Microsoft and some Netscape extensions that
everybody uses without ever thinking that it might break compatibility).
So, as long as you've got an Open Source project, I'd still say an
autoconf-like solution is the way to go.
/* Steinar */
[»]
.NET I don't mean to be flame bait but this sounds _just_ like Microsoft's proposed .NET platform. Once again, the open source community is copying Microsoft. This is a pattern I've seen a lot: Microsoft comes out with a cool idea, the open source community slams it and says it sucks (because, of course, *anything* from Redmond must suck -- there are no *real* hackers at Microsoft after all), then the community goes on to copy it. I'd *really* like to see some real innovation come from the community. I know you all hate Microsoft, but they do spend more than a billion $/yr on R&D and they do come up with some really neat ideas. With thousands of pooled minds, I'd think the open source community could come up with some Pretty Neat Ideas. I challenge you to one up Microsoft (and Sun and IBM and ...)
[»]
The Java VM is not a problem. JIT compilers can partially address the speed problem by generating machine code on the fly, but they cannot do serious optimization since that would take too long. This will prevent Java-style JIT implementations from ever really competing with native code solutions in performance. I'm not sure what you mean by a "Java-style JIT compilation". Java runtimes have byte-code interpreters, just-in-time compilation, batch compilation, and other compilation strategies. There isn't a single approach. And there is sufficient commercial interest in this area such that you can bet that everything that can be tried is being tried. I think it would be worth thinking about better compilation strategies for open source implementations of Java. Kaffe's JIT probably needs help, and maybe there are improvements possible to GCJ as well (inlining?). I think designing a new VM, however, is not a good idea. People have tried in the past (some of the efforts may even be open source now). The success of the high-level Java VMs where the low-level approaches have failed for decades gives us strong indications of which of the two approaches actually works well in practice. Java runtime environments have become very efficient. JIT compilation can use runtime statistics to improve dynamic dispatch and inlining, something no batch compiler can do. In some areas, the high-level representation actually is very beneficial. In fact, even the Java VM already loses a little bit of information relative to the source code that would be useful for compilers (as experience with gcj shows), so eliminating even more information might not be good. The performance issues that remain in Java are mostly semantic limitations, not compiler or VM limitations: lack of expanded structures, lack of locatives, the immutability of strings, some limitations with arrays, and some problems with the number system. (some of the limitations are deliberate and probably a good overall tradeoff) Even if you were to spend the enormous effort of coming up with a new VM, if you compiled Java to it, you probably wouldn't do significantly better than the best current Java environments. So, I think it would be better to work on improving the existing open source Java environments than to start from scratch with something that, based on past experience, probably won't succeed anyway.
[»]
There are many options but Open Standards are best. yeah, I always said that perl was the way to go. Just a shame Linux doesn't support device drivers written in perl... Seriously though, this is not a new problem and it may be worth reviewing the previously attempts at addressing this very real problem. Off the top of my head and in no particular order. Standards: If your target audience is running Posix compliant systems that adhere to the Unix file systems standards then you should be OK distrubuting ELF binaries (or at least .o files). This approach is perhaps exemplified by the Unix standardization process and was applied widely by the US government (I still rememeber VMS being certified Posix 1.1 compliant). There are many advantages with this approach, including the ability to develop and deliver a wide range of solutions (often including quite low-level code) and a certain comfort that these systems can integrate with other solutions on the platform (the by-line of The Open Group is "...enabling enterprise integration" and that is exactly the other point of standards). The disavdantage is primarily that standards are unevely implemented by the vendors. Often the standards themselves encourage this: Posix, to name an example, has a whole set of sub-standards (numbered .1, .2, ... .n) covering things from a user shell to real-time computing extensions and much more. Only the first is required so you can put "Posix compliant" on your marketing litterature as long as you supply bash. Not very helpful to developers. Interpreted languages: I guess the first was probably sh but this approach gained widespread use with perl. The idea is simply that you distribute interpreted code and assume or require that your audience have the required interpreter on their systems. Nowadays perl has developed into a very rich language and if it isn't enough then there are many more to choose from: for example Tcl/Tk or scheme, to name but a few of the more powerful languages with sophisicated GUI support. This is a fine approach as far as it goes. The languages typically allows you to build very complex solutions much faster than you woud be able to do with C. However, for low level programming (e.g. device drivers) or programming with real-time constraints they are not suitable. A further disadvantage it that you are again relying on the user to have the latest interpreter. I only have Perl 4 but your snazzy application requires Perl 5.005_03 with half-a-dozen packages that I do not have. To some extent this is always an issue. Anyone who has distributed Unix binaries will know of the issues with the versions of the C library. A huge chunk of code is shared between processes in the form of a library and if you do not have the latest thread-safe patch-level then my program may just die horribly. Java, as suggested in the article, to me has all the same characteristics as the interpreted solution: there are versions and optional packages to deal with. In fact, to my mind it is exactly like the interpreted solution, except for some trivial performance optimization. So I guess that I'm saying there is no magic bullet. To my mind, the best approach is summarized by the signature of one of the posters on /.. It goes something like: Open standards. Open source. Open minds. There is no approach that solves the very real issues that the original poster highlighs, but our best bet is to encourage and adhere to open standards. Not only will it make it easier to distribute your solutions, but it will also help to ensure that multiple solutions from multiple sources will co-exist on the target system and even integrate seamlessly. We are not there yet, but support your local standardization organisations (they need it: there are few tasks more tedious than a standardization process) and we will all benefit.
[»]
Not to be an MS lackey... As I understand it, this is one of the things that MS's .NET platform is
meant to provide. The installer will be able to compile with as much
optimizations as possible the .NET platform code. And MS's plan (as they'd
have us believe) is to have all languages compile nicely to .NET. Now, had
this platform (I won't say the idea, because I'm sure the idea has been
out there for many years) not come from Microsoft, I'd be more willing to
trust the supplier to follow through on its plans. In many ways, C#/.NET
could be the evolution of Java...
[»]
OSF researched this with ANDF Several years ago, the Open Software Foundation did some research into
"Architecture Neutral Distributed Format" as a way of easing
binary compatibility between different OSes/Processors. The URL
is http://www.osf.org/andf , though
the link seems to be down now.
[»]
Why JIT? Why do you need a JIT? JIT's are only required when you want to run the binary for the first time. They are important for java because you want to be able to download your code and then execute it immediately and not wait a few minutes while it's recompiled. If your going to 'install' an application you could have some program that goes through and takes your VM code and recompiles it for the target platform. This recompiler would of course be platform specific, but thats ok - it came with the platform anyway, and updating the recompiler would work for *all* applications. You could also have your runtime write out profiling information and later do a second recompilation step to further optimise your code, maybe have a 3am cron session to choose a series of binaries and reoptimise them.
[»]
VM is available now! That's exactly the approach Amiga is taking with their new SDK: http://www.amiga.com/products/SDK.shtml It offers both a fast JIT and an Amiga VM which will run those binaries on all platforms, small and large. If this article intrigues you, consider the Amiga SDK for your future development. Also, MUMPS has been doing that for decades, too. It is an ANSI-standard language with its own complete operating system, independent of the operating platform it runs under. Both code and data are interchangeable with any other MUMPS system and vendor. InterSystems Cache (http://www.e-dbms.com) extends the language to be open to modern platforms and languages with its Objects, WebLink, and ODBC/SQL interfaces.
|