 |
Information retrieval from $HOME
by Ajay Shah, in Editorials - Thu, Aug 30th 2001 00:00 PDT
Like everyone else, when I first encountered tree directory
systems, I thought they were a marvelous way to organize
information. I've been around computers since 1983, and have
staunchly struggled to keep files and directories neatly organized. My
physical filing cabinet has always been a mess, but I clung to the
hope that my hard disk would be perfect.
Copyright notice: All reader-contributed material on freshmeat.net
is the property and responsibility of its author; for reprint rights, please contact the author
directly.
For many years, I could draw my full tree directory from
memory. Things have changed; I'm doing more things than I can
track. Today, my $HOME is 2.4k directories, 43k files, and 1.3G bytes
(this is almost all plain ASCII files -- no MS Office, no multimedia
-- so 1.3G is a lot). My present filesystem has been uninterruptedly
with me since 1993, and there are old things in there that I can
scarcely remember. Now, I often wander around $HOME like a stranger,
using file completion and "locate" to feel my way around. I recently
needed some HTML files that I was sure I had once written, but I
didn't know where they were. I found myself reduced to saying:
$ find ~ -name '*.html' -print | xargs egrep -il string
, which is a new low in terms of having no idea where things might
be.
This article is a plea for help. We're all used to devoting effort
to problems of information retrieval on the net. I think it's worth
worrying about inner space. What lies beneath, under $HOME? How can
relevant information and files be pulled up when needed? How can we
navigate our own HOMEs with less bewilderment and confusion? Can
software help us do this better? I know nothing about the literature
on information retrieval, but this scratches my itch.
Multiplicity of trees
We have accumulated three different tree systems for organizing
different pieces of information:
- The filesystem
- Email folders
- Web browser bookmarks
This is a mess. There should be only one filesystem, one set of
folders.
Email is a major culprit. Everyone I know uses a sparse set of
email folders and an elaborate filesystem, so we innately cut corners
in organizing email.
We really need to make up our minds about how we treat email. Is
email a channel, containing material which is in transit from the
outside world to the "real" filesystem? In this case, the really
important pieces of mail will get stored in their proper directory
somewhere, and all other pieces of email will die. I have tried to
achieve this principle in my life, with limited success.
Or is email permanent (as it is for most people), in which case
material on any subject is fragmented between the directory system and
email folders? If so, can email folders automatically adopt the
organization of the directory system? Can email files be placed
alongside the rest of the filesystem?
Web browser bookmarks are a third tree-structured organization
which should not exist. It's easy to have a concept of having a
metadata.html file in every directory, and storing the bookmarks
there. The browser would inherit the tree directory structure of
$HOME, and when sitting inside any one directory, the pertinent
metadata would be handy.
Dhananjay Bal Sathe pointed out
to me another source of escalation of the complexity of
filesystems. This only effects users of software from Microsoft, so
I'd never encountered it. It is MS's notion of "compound files",
which are objects which look like normal files to the OS but are
actually full directory systems (I guess they're like tarfiles). Since
the content is hidden inside the compound files, you cannot use all OS
tools for navigating inside this little filesystem, only the
application that made the compound file. He feels that if compound
files had been treated as ordinary directories of the filesystem, it
would have been a "simple, beautiful, elegant" and largely acceptable
solution instead of the mess which compound files have created.
Non-text files
If you use file utilities to navigate and search inside the
filesystem, you will encounter some email. I use the "maildir" format,
which is nice in that each piece of email lies in a separate
file. However, MIME formats are a problem. When useful text is kept in
MIME form, it's harder for tools to search for and access it.
MIME is probably a good idea when it comes to moving documents from
one computer to another, but it seems to me that once email reaches
its destination, it is better to store files in their native
format.
In my dream world, each directory has all the material on a
subject (files, email, or metadata), and grep would
work correctly, without being blocked by MIME-encoded files.
Geetanjali Sampemane pointed
out that this is related to the questions about content-based
filesystems, and suggested I look at a paper by Burra Gopal and Udi
Manber on the subject (ask Google for
it).
PDF and postscript documents
Postscript and PDF have worked wonders for document transmission
over the Internet, but this has helped escalate the complexity of
inner space:
- As with MIME, .ps and .pdf files are not vulnerable to
searches for regular expressions as text files are.
- An interesting and subtle consequence of the proliferation of
.ps and .pdf files in my filesystem is that a larger fraction of
the files there are alien. In the olden days, every file that was in
my filesystem was mine. It used my file naming conventions, etc., so
when I wandered around my filesystem, I knew my way. Today,
there are so many alien files hanging around that it reduces my
confidence that I know what is going on.
- Every now and then, I notice a .pdf file "which is going to be
invaluable someday", and snarf it. If I'm lucky, it has a sensible
filename, and if I'm lucky, I'll place it in the correct place in my
filesystem. In this case, there's a bit of a hope that it'll get
used nicely in the future. Unfortunately, a lot of people use
incomprehensible names for .pdf files, such as ms6401.pdf,
seiler.pdf, D53CCFF4C9021C19988841169FB6FD6EC1D56F711.pdf, and
sr133.pdf. I find that interactive programs like Web browsers,
email programs, etc. are clumsy
at navigating tree directories, so my habit is to save into /tmp,
then move the file using the commandline. Sometimes, I'm in too much
of a hurry, and this gets messed up. Now and then, I place an
incoming file into $HOME/JUNKPDF, hoping that I'll get around to
organizing it later.
While I'm on this subject, I should describe a file naming
convention I've evolved which seems to work well. I like it if a file
is named Authoryyyy_string.pdf; this encodes the lastname of the
author, the year, and a few bytes of a description of what this file
is about. For example, I use the filename
SrinivasanShah2001_fastervar.pdf for a paper written by
Srinivasan and Shah in 2001 about doing VaR faster.
I also take care to use this Authoryyyy_string as the key in my
.bib file, so it's easy to move between the bibliography file and the
documents. I often use regular expression searches on my bibliography
file, and once I know I want a document, I just say locate
Authoryyyy to track it down.
Some suggestions
I'm not an expert on information retrieval, so these are just some
ideas on what might be possible, from a user perspective.
In summary, people working in information retrieval are focused on
searching the Web, but I think we have a real problem lurking in our
backyard. Many of us are finding it harder and harder to navigate
inside our HOMEs and find the stuff we need. I think it's worth
putting some effort into making things better. There is a lot that ye
designers of software can do to help, ranging from putting file
completion into Mozilla to new ideas in indexing tools.
Author's bio: Ajay Shah is an associate professor at IGIDR, Bombay. His research is
in financial economics, including the applications of information
technology for designing financial products, markets, and trading
strategies. You can find out more at http://www.igidr.ac.in/~ajayshah/.
T-Shirts and Fame!
We're eager to find people interested in writing articles on
software-related topics. We're flexible on length, style, and
topic, so long as you know what you're talking about and back up
your opinions with facts. Anyone who writes an article gets a
t-shirt from ThinkGeek
in addition to 15 minutes of fame. If you think you'd like to try
your hand at it, let jeff.covey@freshmeat.net
know what you'd like to write about.
[Comments are disabled]
Comments
[»]
Reducing everything to a file and using file filters
by Tobias Horney - Oct 16th 2001 18:28:57
Maybe part of the solution to the problem that bookmarks cannot easily be
integrated into the file system would be to write an http file system
(maybe one already exists?). I am thinking you might mount it under say
/http and reach web documents by standard URL:s like
/http/user.passwd@host:port/path/filename.xyz where ofcourse user, passwd,
port and path would be optional. This way you could create symlinks
anywhere you want to save your web documents (enabling you to choose any
filename you like) that points to somewhere under /http and then use it as
a (probably but not necessarily exclusively (I guess?) read-only) file.
This way you would always be using the latest version of the file and all
tools that works with files would also work transparently with web
documents.
Ofcourse you can imagine the http file system automatically finding stale
links, doing regular download/backup of documents in the filesystem, etc
and ofcourse again there is no reason why other protocols than http
couldn't be used the same way.
...and ofcourse another useful protocol to implement a file system for
would be standard mail folders, but as I have read the comments to the
article there seams to be some mail storage standards for reaching the
mail folder as a part of the file system.
Well, reducing bookmarks and mail folders to files might not be what we
want, but it would be rather easy to do the indexing etc. for people
wanting to be able to search their HOME.
When it comes to searching/indexing .ps and .pdf files and the like I do
not understand why the indexer/searcher cannot apply filters based on
filename extensions, magic numbers or mime type for example.
Wanting to use text tools like grep on .ps and .pdf files and the like can
be solved in at least two ways:
1) make the tools (e.g. grep) smarter and allow for a filter to be applied
to the files grep searches before grep does its search
2) make your favourite shell recognize a special charachter that you for
example add just before filenames you want to read as plain text at a
command line, then the shell applies the filter mechanism (e.g. pdf2txt or
ps2txt) and sends the filtered file (probably stored in something like
/tmp/shellfiltermechanism/path_to_the_real_file) into the command you
entered (for example grep). of course grep would find matches in
/tmp/shellfiltermechanism/foo/bar/fie.pdf but you would know you should
look at /foo/bar/fie.pdf. This is ofcourse not a perfect solution, but
people using grep and the special shell char would know how to handle it,
so for us hackers it might do? :)
Just some thoughts...
-- /Tobias
[reply]
[top]
[»]
command line vs gui
by grom - Oct 3rd 2001 06:06:56
in regards to being able to call up the command-line interface when saving.
Why not intergate command-line functionality into the gui. For example why
not have path name completion in the filename text box. So a user can use
either the gui or command-line. I feel that gui need to have the power of
command-lines, but I'm still yet to witness this. Another idea I have is,
a find file dialog (like that in windows explorer) that allows for regular
expressions. Why give up the command-line for a gui interface when we can
have the command-line built-in the gui.
[reply]
[top]
[»]
Isn't Oracle IFS the answer (albeit an expensive one)
by Raul M. Jorja - Sep 19th 2001 01:26:32
I do not have any experience in Oracle IFS (Internet File System), but from
what I have heard and read about it seems to be the answer.
I saves everything in an RDB (obviously Oracle), so you can add tags,
metadata, etc. (it even handles XML) and then you can query it via http,
nfs, smb, imap, etc. any protocol!
[reply]
[top]
[»]
See also.. Story on OSNews
by Allan Fields - Sep 7th 2001 11:27:43
http://www.osnews.com/story.php?news_id=69
Are Linux meta-data enabled filesystems ready for production? Never hurts
to try out something new on a test machine. I plan to look at and compare
the various file systems discussed in this article.
On another note, XML databases are very interesting indeed. Take a look
at these handy resources: http://www.rpbourret.com/xml/XMLAndDatabases.htm,
http://www.rpbourret.com/xml/XMLDatabaseProds.htm.
These pages describe XML database solutions and talk about how XML and
databases fit together. Also mentioned is various XML databases and at
this point there is a large, large, pool of XML database projects both
commercial and open source. Some examples of XML/OO database products:
Prowler, Ozone, etc. XML databases might be a key element of how to
get this to the user level in existing solutions. Also take a look at the
section describing DTD schema translation, in what is described as
'object-relational mapping' method of document-centric XML files to object
frameworks and then to the relational database backend such as say an SQL
database like PostgreSQL perhaps.
Since I'm no XML expert some of this is new to me.
Well hope someone still reads through this thread, it seems a bit dated by
now.
---
Allan Fields
[reply]
[top]
[»]
MacOS X "bundles"
by Ask Bjørn Hansen - Sep 6th 2001 07:19:43
In MacOS X we have "compound files" which can be navigated like
the directory structures they really are. In the MacOS Finder they look
and work like a single file, but after opening a shell you can easily see
what's inside.
- ask
[reply]
[top]
[»]
Another ERP-System
by Holger von Ameln - Sep 4th 2001 04:16:52
for those of you in need for a linux based ERP-System.
Have a look at http://www.pentaprise.de.
[reply]
[top]
[»]
Problems with superstrings
by Marco Schmidt - Sep 3rd 2001 23:18:23
The idea is good, there is just a minor problem with character encoding. I
don't say this cannot be solved, but I noticed that with latex special
characters often get described in a visual way - e. g. a small 'a' with
two dots on top of it. While this results in nice output, the information
that this really is an umlaut 'a' (as in German Bär or
Jägermeister) gets lost (does not get stored in hte dvi or pdf file
that is the result of the (pdf)latex run). So searching for anything with
ä in it doesn't work anymore.
[reply]
[top]
[»]
Indexing PDFs
by Marco Schmidt - Sep 3rd 2001 23:10:58
I don't think having a bib file for every PDF is a good solution. PDF
provides the possibility for metadata inclusion (pdflatex lets you do
this, and I'm sure Adobe's own tools as well). Unfortunately, almost
nobody is using these correctly. You can store title, author, date,
keywords and probably more in a PDF. Command line tools like pdfinfo (or
CTRL-D in Acrobat Reader) show this information. Additional files always
get lost or do not get updated, so store the information in the files!
[reply]
[top]
[»]
Re: Indexing PDFs
by Robert R. Russell - Mar 5th 2005 20:18:22
> I don't think having a bib file for
> every PDF is a good solution. PDF
> provides the possibility for metadata
> inclusion (pdflatex lets you do this,
> and I'm sure Adobe's own tools as well).
> Unfortunately, almost nobody is using
> these correctly. You can store title,
> author, date, keywords and probably more
> in a PDF. Command line tools like
> pdfinfo (or CTRL-D in Acrobat Reader)
> show this information. Additional files
> always get lost or do not get updated,
> so store the information in the files!
Since the authors of pdf files do not always include metadata into the
file, design a tool that handles metadata both ways, putting metadata into
a pdf from and file or pulling the metadata out of the pdf into a file. If
an extensible standard for the seperate metadata files were made a whole
set of tools might could be made that handles putting and pulling metadata
from almost any file type that includes metadata pdf, mp3, ogg, mpeg, avi,
and probably others.
[reply]
[top]
[»]
Re: Indexing PDFs
by Marco Schmidt - Mar 5th 2005 20:48:16
> Since the authors of pdf files do not
> always include metadata into the file,
> design a tool that handles metadata both
> ways, putting metadata into a pdf from
> and file or pulling the metadata out of
> the pdf into a file. If an extensible
> standard for the seperate metadata files
> were made a whole set of tools might
> could be made that handles putting and
> pulling metadata from almost any file
> type that includes metadata pdf, mp3,
> ogg, mpeg, avi, and probably others.
Adobe's XMP is
a common metadata framework. I think it is already used with some PDFs as
well.
I agree that a multi-level approach that tries different ways to access
metadata is preferable.
Marco
[reply]
[top]
[»]
Filesystems, objects, databases and a command line interface...
by Jacob Sparre Andersen - Sep 1st 2001 04:52:35
Thanks for this inspiring article and comments. It got me started thinking
about how such an information retrieval system can be organised without
dropping too many of the benefits many people like with their Unix-like
systems. There is not necessarily much new in the text below.
Information about files should be stored in a structured form (i.e. a
"database"). This is the information I imagine is relevant to
index:
* author
* language
* title (filename?)
* keywords
* description
* file type
* encoding
* creation and modification times
* projects
* categories
* dependencies (A is constructed from B and C)
* relations (if you are interested in A, then you are also likely to want
to read B)
* full text
"URL symlinks" should be available as an alternative to
"bookmark files". And they should be treated as if they were
file they refer to.
The indexing system should recognise file types and extract as much as
possible of the abovementioned information from the files.
"tar", "zip" and mailbox files - as well as other
composite file types - should be indexed both as a whole and as the
individual components.
If it is possible, then the indexing system should receive information
from the file system, when files are created, modified or removed.
Secondarily, the indexing system will have to scan the file system for
changes on its own.
There should definitely be some kind of shell/command line interface to
the system. And it should of course include file name completion-like
features.
It should be considered to implement a virtual filesystem for formulate
queries in the database. It would be great if I could do something like
this in my favourite shell:
$ <x><v>< ></><tab>
[ the program `xv` can only read images ]
$ xv /.filetype/image/<.><c><a><tab>
$ xv /.filetype/image/.category/L<tab>
Choose:
L(i)nux
L(E)GO
$ xv /.filetype/image/.category/L<i><tab>
$ xv /.filetype/image/.category/Linux/T<tab>
$ xv /.filetype/image/.category/Linux/Tux.png
(not bad, I would say)
It would be nice, if filesystems could store more of the basic information
about files (for example file type, encoding, language, author, keywords
and description).
/Jacob
PS: Why are the "UL", "OL", "LI" and
"BLOCKQUOTE" elements banned from the HTML formatting? :-(
[reply]
[top]
[»]
This problem is solved very neatly already. Has nobody noticed...?
by Alex Farrell - Aug 31st 2001 09:48:22
BeOS solves this problem very nicely, using something similar to the
suggestion in the first post.
The filesystem (BFS) allows attributes (arbritary data streams) to be
attached to filesystem elements, and these can be indexed by the OS.
Queries can be performed on the filesystem based on attributes which are
served as a "directory"
This makes organizing files very easy.
For example, the ID3 tags of mp3 files can be stored as attributes
attached to each file. All mp3 files are then stored in a single
directory, and a query (psuedo-directory) is created which shows all files
belonging to, for instance, the "Rock" genre. Another query is
created which shows all files by the Rolling Stones. Another could show
all tracks written in the 80's etc etc. A particular track might show up
in 1, 2 or all of the queries, and this allows you to get to what you want
very quickly.
Feel like listening to the Stones? Just drag all files in the Stones
directory (query) into your mp3 player. Feel like a rock evening? Also
easy. Maybe you want to listen to all the old Stones stuff - easy again;
just create a query for all Stones music before the 70s.
Your queries can be stored for later use.
These queries are provided at a filesystem level, which means that all
applications can use them transparently. They are also instant, since they
are indexed.
Problem solved.
Of course now that BeOS has been slain, it's probably not a good OS in
which to invest your time. AtheOS ( http://www.atheos.cx ) promises
similar facilities in the future, but it's not there yet.
Anyone feel like working on a cool new filesystem? Maybe you should
contact the AtheOS author (I don't know him, so maybe he's not interested
in support, but maybe he is).
[reply]
[top]
[»]
Re: This problem is solved very neatly already. Has nobody noticed...?
by Julian Regel - Sep 16th 2001 15:27:19
To expand on the above, BeOS did the above well because it was so
integrated into the OS and all applications made use of it. Scott Hacker
had a
great article on BeOS filetypes at www.byte.com a while back that
explained how apps such as the web browser would automatically fill in
certain attributes for the user (such as the source URL, date it was
downloaded, mimetype etc), and that mp3 rippers would populate the song
title, author, track length etc.
My understanding is that the issue of filesystem metadata has been
discussed on the Linux kernel mailing list, and Linus and co are trying to
work on a proper implementation.
[reply]
[top]
[»]
Re: This problem is solved very neatly already. Has nobody noticed...?
by Rob - Sep 14th 2007 19:48:19
Doesn't NTFS already have this?
[reply]
[top]
[»]
Dumb data and the file system
by Michael - Aug 31st 2001 07:29:34
I think that this article is brilliant, and also enjoyed the comments.
My perspectives on the issues raised are this.
I think there is a fundamental legacy that is difficult to overcome - of
course it could be, but it is made very difficult by the commonly accepted
underlying abstractions.
The UNIX operating system made a very effective design decision
"everything is a file". Making the abstraction of
"file" more useful than it had been in previous operating
systems. This provided benifits similar to the benifits of closure in
algerbraic mathmatic structures. The most obvious being the ability to use
small programs in concert using pipes. There was a high level of
consistency in the implementation allowing greater productivity for users.
Allowing general utility programs to be very usefull.
However a "file" is just an abstraction. There are also
downsides to treating everything as a file. The most obvious being that
you need some intelligence about the data in the file to gain higher
levels of usefullness.
The continued use of "files" as the dominant user interface
abstraction for data storage - while at the same time loading more meaning
into that data - has lead to both monolithic applications, and monolithic
file sizes and increaing complex file types.
So in the examples raised, we have "bookmarks", "mail
messages", "pdf files", etc, which are different concepts
which are managed by different applications. by creating different
applications to deal with them, we have lost something though, we have
lost their similarities. It IS usefull to think of them at both the level
of "file" - a chunk of data, and at a more meaningfull level -
"bookmark".
Relational databases also had a brilliant idea - everything is a table -
and also raised the usefulness bar in some contexts for a whole bunch of
reasons, including greater description of what the data was.
However, I disagree with the idea of making a relational database
interface to the underlying data storage to be the only way to access
information. The data in relational databases is weakly typed. The
abstraction is brilliant, but it continues to encourage the seperation of
the data from the meaning. Meaning that you still loose generality. i.e.
the data is still too dumb.
So, I think, there is a fundamental problem with the underlying
abstraction "file" (specifically as the user interface), and I
dont think progress is made by using the abstraction "table" or
for that matter "XML file".
We do however have an abstraction that could serve as the basis for
systems that could meet at least dodge some of the objections raised. Here
goes...
"Everything is an object"
A system that was baised on the user interface to data being a persistent
object store, I think would provide a foundation that more easily lead to
the desired features.
With objects you have an explicit type heirarchy. giving you the ability
to manipulate objects either at a high or low semantic level. giving you
both general and specific tools.
An underlying object repository, also gives you a way to deal with the
legacy problem of files. You can easily rap a file into an object, if you
happen to not have access to the underlying semantics.
Going further, different views onto your object repository and different
ways of locating the object(s) you want are required - but I think as an
abstraction objects are a much better starting point than files.
Just for clarity, I am talking about the abstraction presented to the user
interface. I am not advocateing what I think should be used as a physical
storage abstraction.
In the (very intereesting) ResierFS documentation - it makes an argument
for moving semantics into the file system implementation, which is another
way of saying that there is no clean decomposition that is general between
the user interface abstractions and the data storage abstraction. but then
where ResierFS is heading could be used as a persistent object store.
SO, as an OO bigot, objects (and then a lot of hard work) ARE the panacea
;-)
- Michael.
[reply]
[top]
[»]
beginnings
by Zen Lunatics - Aug 30th 2001 21:15:04
I've been wanting a non-hierarchical organizational system for quite some
time. My main reason for wanting this is to organize browser bookmarks
that can belong to more than one category. So, I've written the beginnings
of such a system which can be found at zenlunatics.com
It's currently somewhere around the alpha stage and I haven't worked on it
in a while. I haven't written a bookmark manager yet but did write an image
viewer, an mp3 player, a simple note keeper and a utility for creating
catalogs from a file system. For the bookmark manager I'm thinking of
modifying gnobog, galeon or maybe mozilla (suggestions welcome). After
that I'd like to like to tackle the file system possibly with a document
launcher although I recently read about multi-session support which may
solve that problem in a different way.
Anyway I'd really appreciate any comments on zl_catalog including
suggestions for a better name :-)
thanks,
sean
[reply]
[top]
[»]
Re: beginnings
by Allan Fields - Aug 31st 2001 02:36:00
Hi,
Looks good, I think you and all the other authors that have been working
on these types of projects are heading in the right direction. We need to
make sure we can bridge between all the apps, solutions, FS, transport
mechanisms, etc. The library is definitely a great idea. Also an
exhaustive effort is probably required to rival the integration of some
commercial environments where integration is a goal and part of the
project.
I have visions of what the filesystem should be like and how it should
interface to the UI/shell. They are in many ways in agreeance with Reiser
and the original Macintosh vision and in some aspects of Windows (all
though I am no Microsoft fan) -- and many different schools of thought! I
definitely agree with the author of the originating post, he has got some
great points!!
Thanks to all that are working on a solution to this existent and
persisting problem of computer science (which may have been solved already
in some past era, if only we could revive the great softwares of the
past!!! -- and which might already be solved already in some expensive
commercial package that I can't afford and wouldn't want to use because of
the software model.)
[reply]
[top]
[»]
Data and metadata
by Manuel Amador (Rudd-O) - Aug 30th 2001 18:40:28
The solution to these problemas has been discussed in tom's hardware. The
proper solution is to have a filesystem that stores metadata, such as
ReiserFS, and a unified interface to it, such as OMS (a XML dialect and
categorization/metadata standard for storing metadata).
Naturally, it would require operating system kernel support, application
VFS support and application front-end support, so it might as well be an
herculean task. Whatever approach is used to solve the problem, it has to
keep in mind that dumping the metadata while transferring files across the
internet is unacceptable. MacOS had that solved with bundles. Why they
dumped support for it in Mac OS X, I don't know.
[reply]
[top]
[»]
Mail / FS
by belg4mit - Aug 30th 2001 13:39:11
This is exactly what mh-mail /nmh is designed for
"nmh consists of a collection of fairly simple single-purpose
programs to send, receive, save, retrieve, and manipulate e-mail messages.
Since
nmh is a suite rather than a single monolithic program, you may freely
intersperse nmh commands with other commands at your shell prompt, or
write custom scripts which use these commands in flexible ways."
http://www.mhost.com/nmh/
And if you must have a GUI there is xmh and exmh, or mh-rmail for emacs
etc...
[reply]
[top]
[»]
Storing files
by Thomas Leonard - Aug 30th 2001 11:33:55
When a file is being downloaded, the user is required to supply a
filename and a path. I would really like it if authors of software (like
Mozilla) gave us a commandline with file completion to do this. I find the
GUI interaction that they force me to have extremely inefficient, and it
costs so much time that when I'm in a hurry, I tend to misclassify an
incoming file.
This is perhaps the biggest problem -- it's so easy to just dump a file in
the default
directory that people don't take a couple of seconds to put it somewhere
sensible.
A solution? Get rid of the save dialog box and replace it with a draggable
icon. To save, the icon is dragged to a filer window, directory on the
panel, etc. Common save destinations (eg, the project you're currently
working on) can then be kept handy along the bottom of the screen (or
whereever). See here for an implementation
of this system.
As any computer scientist knows, spending a little extra time storing your
data can help a lot when it comes to retrieving it! BTW, I agree that an
indexing agent should update as the filesystem is changed. The current
massive-scan-once-a-day is slow and irritating.
[reply]
[top]
[»]
Re: Storing files
by Allan Fields - Aug 31st 2001 02:12:34
All good ideas, I think these type of UI inovations are what we all need!
[reply]
[top]
[»]
Re: Storing files
by Allan Fields - Aug 31st 2001 02:52:42
> A solution? Get rid of the save dialog
Actually, come to think about it, no reason to rid of it, just implement
another approach and allow them to be configured on or off.
[reply]
[top]
[»]
Re: Storing files
by Adam Glasgall - Sep 8th 2001 21:48:00
Didn't Acorn's RiscOS do this wrt saving stuff?
> A solution? Get rid of the save dialog
> box and replace it with a draggable
> icon. To save, the icon is dragged to a
> filer window, directory on the panel,
> etc. Common save destinations (eg, the
> project you're currently working on) can
> then be kept handy along the bottom of
> the screen (or whereever). See here for
> an implementation of this system.
[reply]
[top]
[»]
Re: Storing files
by Thomas Leonard - Sep 10th 2001 08:15:17
> Didn't Acorn's RiscOS do this wrt saving
> stuff?
Yep; my implementation looks very similar to it.
[reply]
[top]
[»]
File naming rules
by Gavin Brown - Aug 30th 2001 10:36:47
More thoughts on file naming rules:
http://www.everything2.com/index.pl?node_id=530288
[reply]
[top]
[»]
Re: File naming rules
by Rob - Sep 14th 2007 19:41:45
> More thoughts on file naming rules:
> http://www.everything2.com/index.pl?node_id=530288
Link's broken. Is that article still around?
[reply]
[top]
[»]
There are tools...
by Sorin Milutinovici - Aug 30th 2001 08:10:50
I had the same problem. Until I discovered that
there are a lot of tools that can help. Sure, one
has to find all those tools and select the best of
them. The starting point for me was the desire to
have one (or two) places in which my important
stuff goes. Ideally a common interface for all this.
And the only environment that is ready to deal with
all sort of objects is the web. Therefore, my way of
solving the problem is:
- Use a perl, php enabled web server for your own
computer
- Use a Personal Information system (there are
several out there, I use MyPhPPim) with a web
interface, connected with a mysql database. In
that database goes all your E-mail, notes, todo-s,
etc.
- Use a bookmark manager connected with the
same Mysql server and with a web interface
- Use a web file manager system (such as
phpFileFarm) to work the pdf, html, ps files
- Use a web photo album to keep your photos (of
course with database back end)
- Use a cvs system for ASCII work in progress and
install a webcvs system (I use viewcvs).
- Finally, use HtDig or another search engine to
index the whole stuff. Configure htdig to search in
separate directories or in all.
Several more ideas:
use the same database engine (mysql or postgress
or another) to minimize the load
back up on a separate partition (or computer) all
the databases and the cvs system daily
back up the pdf, ps html directory weekly
And to add a little touch, make a script that
checks daily into the cvs system:
ls -lR in important directories
system settings
Your computer will have to work during the night
for one hour but...you have a clever system
-- Sorin M
[reply]
[top]
[»]
Re: There are tools...
by Caglios - Aug 31st 2001 03:33:26
Yes, the tools are there. But more often than not you need to write them
yourself. Only in the last few months have I got my scripts down so that
not even a tmp file escapes my wrath (Yay for PERL).
The overheads for this probably aren't worth it, and there's still a few
bugs. The package (as yet unreleased) needs to work at a relatively
low-level to query the fs to see which files have been opened (it
presently only works on x86 machines) and another cron to take an image of
the complete filesystem once a day, compare it agains the previous day, see
which files have been opened in comparison, and stores this and other data
in a mySQL table. Then... every month, like clockwork, I switch my pootie
on and it takes about an hour to archive all of the unused files for the
period.
After that, it's just a matter of scanning through the .zip's and removing
what I don't really need.
Seems a bit gratuitous, really. But it works.
[reply]
[top]
[»]
Re: There are tools...
by Allan Fields - Sep 2nd 2001 05:26:14
> Yes, the tools are there. But more
> often than not you need to write them
> yourself. Only in the last few months
> have I got my scripts down so that not
> even a tmp file escapes my wrath (Yay
> for PERL).
That is a good point, some times the best way to do it is your own script.
I am also fond of Perl for some tasks.
> Seems a bit gratuitous, really. But
> it works.
Hmm.. seems like a good way to archive, but remember archiving offline
isn't always the right/full solution.. Depends on peoples usage patterns I
guess. :)
[reply]
[top]
[»]
Re: There are tools...
by Allan Fields - Sep 2nd 2001 06:58:20
> I had the same problem. Until I discovered
> that there are a lot of tools that can help.
I've been looking for tools, but even if I found a tool for each
application (and it was open say), it still doesn't solve the closure
issue fully. You can get pretty close though by using all web based
tools.
> has to find all those tools and select
> the best of
> them. The starting point for me was
> the desire to
> have one (or two) places in which my
> important
> stuff goes. Ideally a common interface
> for all this.
Yeah, that would be nice to have one interface for all of the tasks you
mention. Also, can you post a small list of links to the packages that
you have found? That might be helpful for everyone here trying to setup a
repository. I have searched Freshmeat and SourceForge but haven't yet got
a good idea of what all exists and the extent of the work on these
solutions. I know there are already a lot of commercial solutions to do
these types of things... I imagine most are for large business/project
management/office problems though. I wonder if any exist for research
work.
> And the only environment that is ready
> to deal with
> all sort of objects is the web.
> Therefore, my way of
> solving the problem is:
>
> - Use a perl, php enabled web server
> for your own
> computer
> - Use a Personal Information system
> (there are
> several out there, I use MyPhPPim)
> with a web
> interface, connected with a mysql
> database. In
> that database goes all your E-mail,
> notes, todo-s,
> etc.
> - Use a bookmark manager connected
> with the
> same Mysql server and with a web
> interface
> - Use a web file manager system (such
> as
> phpFileFarm) to work the pdf, html, ps
> files
> - Use a web photo album to keep your
> photos (of
> course with database back end)
> - Use a cvs system for ASCII work in
> progress and
> install a webcvs system (I use
> viewcvs).
> - Finally, use HtDig or another search
> engine to
> index the whole stuff. Configure
> htdig to search in
> separate directories or in all.
>
> use the same database engine (mysql or
> postgress
> or another) to minimize the load
I agree with trying to get everything into one DBMS at least, even if
there isn't seemless integration. Even more ideal is to have a strong
level linkage between all the member DBs of the DBMS.
Also, it appears PostgreSQL and MySQL are a little behind Oracle in some
of the Object over Relation framework features. Even nicer is the OO or
Object-Relation ODBMSes like Cache, DB40 or (open source example) GOODS
and Gigabase.
On the DB access layer another project that caught my eye was ColdStore
(persistence framework using simple DB). And then there is J2EE for Java
which is something to look at for Java apps.
There are lots of things to look at, and there are many projects adressing
specific sections of the problem...
---
Allan Fields
[reply]
[top]
[»]
Re: There are tools...
by Sorin Milutinovici - Sep 3rd 2001 07:33:20
> I've been looking for tools, but even
> if I found a tool for each application
> (and it was open say), it still doesn't
> solve the closure issue fully. You can
> get pretty close though by using all web
> based tools.
Yes, web based tools are probabily the most
complete ones. And, yes, sometimes you are just
getting very close. But most of the time, since the
web tools are (some of them at least) rather
standard you can adapt yourself to the tools.
> Also, can you post a small
> list of links to the packages that you
> have found? That might be helpful for
> everyone here trying to setup a
> repository.
I will post several links but, as usually you should
check for yourself. Especially open source projects
are sometimes moving very fast. And let's hope
that others will reply adding some more.
About the repository: I use the common cvs
system that can be found at
www.cvshome.org
The cvs from there can be used fron the command
line, it has no graphical interface or web interface.
But once you've set a repository (or more) you can
use several tools that are available:
CVSWeb
http://stud.fh-heilbronn.de/~zeller/cgi/cvsweb.cgi/
This is a single perl script that does not need a
database. You need to have the repository set up
and that's it
ViewCVS
http://freshmeat.net/projects/viewcvs/
The one I am using now. It is based on CVSWeb
but is in phyton. You can download tarballs from
your repositories and it has syntax highliting for a
lot of file types (based on enscript). It can be used
with a MySql database but this is not compulsory.
The database does not keep the repository, just
information.
Chora
http://horde.org/chora/
I haven't personaly tried this one. But I've seen
some online repositories and in matches pretty
close the previous one.
Freepository
www.freepository.com
The one I'll use in the future :) if I have time to
move all my stuff from Mysql to Postgresql. It is a
full web based tool, checkin, checkout, whatever.
Postgresql backend.
For CVS documentation ot tutorials: go to the
ViewCVS site, there are several links.
In principal, my web site has to have:
A news system - something that grabbs the
news from slashdot, freshmeat, etc. A
Calendar
A bookmark manager
A place for notes
A place to put small articles that I find on the
web
A photo gallery
An E-mail system
A file manager
CVS Interface
An interface to the computer administration
tools
web ssh login
There are several tools that can do this. I will
mention two (although I am sure that more -
maybe better - can be found.
PhPGroupware - multiuser groupware tool that has
everything in the above list, except the last two
(as far as I know). The cvs interface is chora,
mentioned above. Very activelly developped (if you
go on Sourceforge you will almos always see it on
one of the first three places.
www.phpgroupware.org
PhPNuke + several modules (News, Gallery,
Calendar, etc)
All can be found on the phpNuke site:
www.phpnuke.org
PhPNuke is a system for building news sites but you
can use it for all of the above list, except the last
5.
I am using now PhPNuke. For other tasks:
E-mail system: There are a lot of webmail
programs. If you have the mail delivered to your
machine then you can use a href=
"http://neomail.sourceforge.net/">Neomail or
Openwebmail
and much more.
Web based file managers: an interesting one is PhPFileFarm
Interface for administration: the best one seems to
be WEBMIN (I am running Linux, you should check
their page for other systems).
http://www.webmin.com/webmin/
Webmin has also a file manager and a ssh login
shell, and much more.
Or you can use a combination of:
MyPhPIM
http://sourceforge.net/projects/myphpim/
which has mail, calendar, todo, addresbook
and other tools described above.
> Also, it appears PostgreSQL and MySQL
> are a little behind Oracle in some of
> the Object over Relation framework
> features. Even nicer is the OO or
> Object-Relation ODBMSes like Cache, DB40
> or (open source example) GOODS and
> Gigabase.
> On the DB access layer another project
> that caught my eye was ColdStore
> (persistence framework using simple DB).
> And then there is J2EE for Java which
> is something to look at for Java apps.
I am not very familiar with object oriented
database. Postgress has table inheritance though.
But, for such a project (that is personal therefore
single user) a lightweight database seems the best
choice. This is why I am not yet convinced to
move my system fropm Mysql.
>
> There are lots of things to look at,
> and there are many projects adressing
> specific sections of the problem...
This is true. It will be more than nice to start a
project for this - a personal web content manager.
Oh, I mentioned HtDig. It is a search/indexing
engine that can be found at:
www.htdig.org
>
> ---
> Allan Fields
-- Sorin M
[reply]
[top]
[»]
What about using the remembrance agent
by virtualizer - Aug 30th 2001 07:07:02
I have more than decent success by using Bradley Rhodes' Remembrance Agent.
That does a very good job by trying to provide me with JITIR.
[reply]
[top]
[»]
Re: What about using the remembrance agent
by Jean-Marc Liotier - Aug 30th 2001 07:37:39
Trees are inherently limited to single entry. Organizing documents in a
single tree will inevitably hit that wall. The only way to break through
is to use thesaurus based keywords. The snag is that thesaurus building is
a task of pharaonic proportions.
The quick and dirty approach that I used successfully when in dire need
of hacking my way through 40GB of ps, pdf, txt, doc, ppt, html and xls
documents is to use a full text indexer with external parsers. ht://Dig has done a great
job (although phrase searching sorely lacks for now).
Thesaurus based keyword indexation is best because documents can be hit
from any semantic angle. I would love to have the time and resources to do
it for my company. But in the real world, meaningful file names, a basic
and sane tree and full text indexing on top of that will do cheaply.
As far as mail is concerned, the single entry tree problem is somewhat
alleviated by virtual folder approaches such as with Evolution
[reply]
[top]
[»]
Re: What about using the remembrance agent
by Jean-Marc Liotier - Aug 30th 2001 07:41:46
Sorry, I hit "reply" and forgot to modify the title. My post's title should
read : "Experience dealing with large numbers of heterogeneous documents".
Relational data rules !
[reply]
[top]
[»]
when in doubt use brute force
by rumblefish - Aug 30th 2001 05:18:08
I think all these fancy techniques are not really
needed. Look at history: there were a lot of early
search engines and systems designed by
architecture astronauts, such as WAIS which have
never got anywhere. In contrast, look at the
absolutely brilliant google, which cares nothing for
categories or semantics. I use google in for
everything, in preference even to categorised
vendors support pages for my support issues.
When in doubt, use brute force.
http://www.tuxedo.org/~esr/jargon/html/entry/brute-force.html
[reply]
[top]
[»]
Re: when in doubt use brute force
by Matthias Arndt - Aug 30th 2001 05:37:16
Why use a search engine in your $HOME?
Simply delete files not needed and backup everything you may need in the
future to an external storage device like a tape archiver or a cd-r.
Your $HOME will stay small and tidy.
Just make sure to go through this procedure once a week or once a month.
My $HOME is organized that way. However I store HTML, downloads, pictures
and other non-plain-text information in there. I use some well known
subdirectories and it works
perfectly. Simply tidy up!
Why using complex software for things that can be achieved with a little
self discipline or even cronjobs?
-- ICQ: 40358321
WWW: http://www.asmsoftware.de/
PGP: http://www.asmsoftware.de/marndt.pgp
[reply]
[top]
[»]
Re: when in doubt use brute force
by Greg Holt - Sep 7th 2001 07:56:31
> Simply delete files not needed and
> backup everything you may need in the
> future to an external storage device
> like a tape archiver or a cd-r.
I've seen this recommended by several folks, so don't think I'm singling
you out...
Simply archiving and deleting things does not solve the problem, they make
it *worse*. How do you find out what the hell you've archived? "Gee, I
know John sent me an article he wrote on graphing small population
relationships, but which of these 50 CDs or 150 backup tapes did I put
that on?"
Greg
[reply]
[top]
[»]
The tree structure is one problem
by Jerry - Aug 30th 2001 05:08:25
I also ran into the data organization problem
in 1993 (when I last time lost files).
I found among others the file metaphor
and strict tree structure a major mismatch
with human cognition.
To tackle the problem Askemos was
done. It really helps.
BTW: Askemos is a GPLed software
(soon to be recategorized at freshmeat),
wich faces a legal threat at the moment.
Please help to keep it free, download!
Thaks
[reply]
[top]
[»]
Re: The tree structure is one problem
by Allan Fields - Aug 31st 2001 02:07:50
Jerry,
I've taken a brief look, and find the structure a little daunting (also
unfortunately I don't speak much German :( ) -- some of conepts seem
neat.. I am interested to find out more, so I'll take another look some
time soon. Good to see people working on solutions... One thing we
perhaps should be careful of is to allow these solutions to have a tight
level of integration to existing facilities so that they are intuative to
users and don't appear to be a layer on a layer on a layer of storage (The
multiplicity of trees - as mentioned above). Yours appears to also be an
anonymous sharing protocol?
[reply]
[top]
[»]
Re: The tree structure is one problem
by Jerry - Aug 31st 2001 05:14:00
> I've taken a brief look, and find the
> structure a little daunting (also
> unfortunately I don't speak much German
> :( ) -- some of conepts seem neat.. I am
Thanks. Yes, I know there is several years
of work to be documented. I appreciate
all comments on how to improve documentation
structure. Promise: german gonna be translated.
> One thing we perhaps should be careful
> of is to allow these solutions to have a
> tight level of integration to existing
> facilities so that they are intuative to
> users and don't appear to be a layer on
That's a main point of Askemos.
It was actually started, when I realized,
that I can understand files, but my dad,
a philosopher, could not.
It's certainly not his fault.
> a layer on a layer of storage (The
> multiplicity of trees - as mentioned
> above). Yours appears to also be an
That's about technology. Askemos stores
it's data in one repository (two files,
provided by rschemes pstore moduel).
Within that repository, you find
internally hash tables and document trees.
The technology is called pointer swizzling at
page fault time, which says it all.
> anonymous sharing protocol?
Not exactly yet. There is one needed.
Askemos is by definition based on standards
wherever feasable.
For the sharing, I currently go though SOAP.
This is not the final solution.
[reply]
[top]
[»]
Re: The tree structure is one problem
by Rob - Sep 14th 2007 19:40:46
I'm afraid that went over my head. What exactly is Askemos? In light of all
the improved search tools we have, is it still as strong a solution today
as it was 6 years ago?
[reply]
[top]
[»]
Right On!
by Allan Fields - Aug 30th 2001 04:09:48
I think you raise a very important
issue. This all makes me think of the
"Future Vision" section of the Namesys
page, where Hans Reiser talks about the
need for a mathematical closure between
applications and bringing more advanced
features into the FS much like the idea
of 'the database is the
filesystem'.
Sometimes I am rather stumped as to how
I can organize all my files well, simply
because of the sheer volume. Some times
I wonder where it all comes from. =)
One thing is for sure, the current
system isn't making it natural to file
it all away as it comes in.
Because I hate recasting my thoughts
into the separate islands that are file
formats and specifically new file
formats that I am unfamiliar with, just
to decode them at a later date, I don't
feel like I can naturally organize my
files, ideas, correspondence, etc. in an
intuitive or overall advanced fashion.
There is always the fear that once it is
all organized, it will be organized in
the wrong file format/directory
structure for when I need to use it all
again or that when the next file format
comes along, I'll have to do it over
again. Then there is the issue of
remembering what the files that I place
in directories are for, and where/when I
saved that specific file I have in mind,
and which files did it related to -- and
how does it tie into the concept... Add
to that the number of computers I use
for personal use, work, etc. It all
seems unmanageable if I don't just cast
off the old, and archive it all up for
that 'some day, when I am gonna organize
this all'.
Additionally, when I try organizing PDFs
(that contain specs or product
information for instance) and all the
other files that I yank off various
pages from the Internet, I am at an even
greater loss as to how to integrate it
into a logical structure and relate it
to the existing files. Much like a
problem where you can define so many
bins that you don't know or remember
which to drop something in, thus
defeating their purpose. What if it
belongs to 2 or more drop-boxes. And It
sure would be nice to link those PDFs to
the source websites and to the related
searches on Google.
How do I save my concepts of linkage
between all these files and URLs and
emails. I can't remember it all,
especially after a few years. It's a
mess!!
I fear the trend towards a small
countries population of different
solutions above the scope of the OS,
with thousands of different approaches
of implementing the same structure over
and over, with-out any coordination.
XML can help, but I am not confident
that any markup/hypertextual system
alone is enough for anything past a
level of interoperability. Not like
interoperability would be bad or
anything, but...
It might be nice if we could get something to
organize it all, that is a unified standard, and heck, even cross
platform. Can anyone suggest what I should look
at?
GroupWare (PHPGroupware looks promising)
and tools like Livelink (not Open Source) are a
start, but they also fall short in that
they all just build on top of the file
system and operating system removing the
convenience of the UNIX 'everything is a
file' accessible at the flat scope
idea. Maybe our filesystems just aren't
advanced enough to handle the load we
are trying to put on them.
In the case of commercial tools that
might suite my needs, they are all to
pricey. I also refuse to store my data
in any Windows file formats.. too many
bad experiences, I don't buy the
Microsoft integration concept.. Call it
a lack of trust that Microsoft can ever
be compatible with-out sucking my time
and dollars into a downward spiral of
non-addressable bugs and unholy
propriety that requires me to switch
from Windows at a great cost of time
in the end anyway.
I might have some ideas for the KDE
team so that they can avoid the same
problems. Can we please get past
this dark age of the stand-alone
application to something that finally
draws some closure? ;)
---
Allan Fields
[reply]
[top]
[»]
It all comes down to organization skills
by Eli Sand - Aug 30th 2001 02:46:30
I've not been dealing with computers as long as most people, but in my
experience, you can keep any operating system or any file system in
general neat, tidy, easy to search, and not full of unknowns by simply
keeping it organized yourself.
Most Linux distrobutions suffer from tons of useless garbage lying around
in common directories such as /usr/bin (bet you don't even know what half
those programs are for!) and the likes. I find that the most part of
people who use computers tend to be messy slobs when it comes to
organizing their data.
My $HOME is virtually blank, I have my public_html, a couple of things I
save for later reference which I quickly move/delete when I'm done with
it. I've tidied up every directory in Linux, including /usr/bin (and I
know what every program does). I know where every file is, and only
resort to using locate or find when I need to get a list of files that may
match a specific pattern (eg: manually removing a program after install and
not watching 'make install'). My Windows partitions are kept spanking
clean, and I know what stuff should and shouldn't be there. The only
exception is my System directory in Windows, I just try to keep from
installing useless programs and knowing what .dll's I need.
So when you step back and take a look at it all, you don't need to
re-invent the wheel on where things should be stored, how to store them,
and all that jazz.
Don't collect so much useless crap you think might be usefull - keep it if
it IS usefull and bloody well use it, and when you're done, delete what you
don't need!
If you don't like a file name, read up on the ren/mv commands! Amazingly
I have yet to come across a file system that DOESN'T let you rename
files!
As for other data files that contain non-plaintext data, use a usefull
filename (after all, its text, so renaming it will not break stuff) and
store it somewhere that makes sense. JUNKPDF isn't such an example - try
something like Filesystems-renaming.pdf
Only flaw to my rant is that certain software bundles do have dorks who
love to make a mess of your nicely structured file system. I just don't
use their software or if it's for Linux and I've got their source, I try
to edit what I can to make it better (then send them a .patch - usually
pisses them off :)
[reply]
[top]
[»]
Re: It all comes down to organization skills
by Allan Fields - Aug 30th 2001 04:35:47
Forgive me for saying so, but the above sounds simplistic. You may be
looking at the simple approach, where there generally isn't the need to
implement anything like this, because usage doesn't involve large volumes
of information from many sources.
Deleting seems like a good approach for a work around, but many people
have home directories chalk full of "good" stuff, that they really can use
(if only they could correctly link it all together/index it in a timely
fashion) and wouldn't think of deleting, because there WAS a reason they
got it in the first place, and there is still a reason to keep it.
Pruning is OK, but don't chop down the tree. (I agree with keeping your
binary trees clean, why not right? But data is a little different. Isn't
that part of the PC, the whole reason, we have a PC and not just some
terminal?)
[reply]
[top]
[»]
Re: It all comes down to organization skills
by Eli Sand - Aug 31st 2001 01:10:45
> ... because usage doesn't involve
> large volumes of information from
> many sources.
When you have large volumes of information from many sources, that is
called a data repository, also known as a library. When you have a
library, you have an interface to get the data you want. I believe that a
filesystem should do nothing more than what it's original intent was - to
store data. If you want to retrieve data by 'searching' the stored
contents of all your files, you should be using some sort of interface to
retrieve that data. It's not the fault of the OS or filesystem that you
can't find your stuff - it's your fault.
> ...Deleting seems like a good approach
> for a work around, but many people have
> home directories chalk full of "good"
> stuff, that they really can use ...
Organization. If you're unorganized, all that 'good stuff' is
theoretically useless if you can't find it when you need it. Also, if you
keep enough junk around that you think is 'good stuff' and you never really
use it, chances are when you go to use it, it's outdated with something far
superiour, or something else completely different (*laugh* ... ipfw, no
wait, ipfwadm? nonono, ipchains!, no wait... iptables - that's it!).
Like I said, it all comes down to organizational skills - if you aren't
adept at keeping structure in your data, you shouldn't be allowed to find
what you want when you want it.
[reply]
[top]
[»]
Re: It all comes down to organization skills
by Allan Fields - Aug 31st 2001 01:36:27
> When you have large volumes of
> information from many sources, that is
> called a data repository, also known as
> a library. When you have a library, you
> have an interface to get the data you
I don't yet.. that's exactly what I need, but I don't have an interface to
"get the data" -- just a FS which I chuck stuff
in until I do. Where else would I put it lacking a repository?
I could put it into a temporary repository that doesn't have all the
features I need yet like, say building a searchable web page set. But why
do that if I can do it once, properly. No half-measures!
Yes it is a problem of knowledge management.
The knowledge management systems that exist
don't do it for me, and many are commercial, so
they are closed. No thanks. Too many bad experiences.
None of them integrate tightly enough. I sited Livelink already, that is
a good example of something that makes the web a repository, but is
commercial and doesn't have everything I would need to setup the
repository properly.
> want. I believe that a filesystem
> should do nothing more than what it's
> original intent was - to store data. If
> you want to retrieve data by 'searching'
> the stored contents of all your files,
> you should be using some sort of
> interface to retrieve that data. It's
> not the fault of the OS or filesystem
> that you can't find your stuff - it's
> your fault.
I'm sorry, I know I screwed up, I'll do better next time. I knew that the
filesystem wasn't designed for what I am trying to do..
That's why I've just been storing data on it, like it was intended to be
used.
But wait a minute, that was my point! Maybe the filesystem needs to be
extended -- not necessarily all in the kernel space, but the user space as
well!
> Organization. If you're unorganized,
> all that 'good stuff' is theoretically
> useless if you can't find it when you
> need it. Also, if you keep enough junk
> around that you think is 'good stuff'
> and you never really use it, chances are
Actually it is still useful, it's just it is less likely it will be used
effectively.
> Like I said, it all comes down to
> organizational skills - if you aren't
> adept at keeping structure in your data,
> you shouldn't be allowed to find what
> you want when you want it.
I'm allowed to do what-ever I please, regardless of qualification or skill
with my own hardware (with-in bounds of law). I think I should be able to.
And I am trying to better keep structure *in my data* -- not my head!
That's why the computer should store structure or metadata, not just data
that requires us to worry about the inforcement of the structure as an
after-thought. That is error prone, as we are not machines.
Isn't the computer suppose to help us store, retrieve and compute
information. Why not design it better to do so? I'm not the computer,
the computer is the computer. I want to be able to have it present the
information in an optimal and timely fashion, that can offload some of the
burden of remembering the structure of my data.
---
Allan Fields
[reply]
[top]
[»]
Perishable Files
by malcolm - Oct 28th 2002 09:46:46
I remember reading somewhere about a file system
that deletes files automatically that haven't been accessed
for a certain amount of time.
Strange as this may seem, it does make a certain amount of sense. Probably
if you haven't accessed a file for 5 years, you could probably do without
it. Especially, you would probably not even notice that it had faded
away.
[reply]
[top]
[»]
Good organizational skills are a help, BUT...
by Rob - Sep 14th 2007 19:46:53
Good organizational skills are a help, but so are good user interfaces. One
is not a replacement for the other, and in fact, IMHO they are
self-reinforcing..
[reply]
[top]
[»]
The Semantic Web
by Joerg Fehlmann - Aug 30th 2001 01:36:41
There is a nice article The
Semantic Web by Tim
Berners-Lee e.a. on the web aspect of information storage/retrieval.
(subtitle "A new form of Web content that is meaningful to computers
will unleash a revolution of new possibilities")
[reply]
[top]
[»]
The right software ...
by dino - Aug 30th 2001 00:44:03
... is the answer. Better yet, the right 'filesystem' ... an RDBMS
filesystem where files can be categorized and tagged quickly
(point-and-click -- not hand-typed). Then, while browsing your filesystem
any one file could potentially be found under more than one
'directory'.
The methods mentioned in the article are sound. But trying to keep ALL
your files on disk seems wasteful. Old stuff not seen in years should
probably be backed-up, catalogued and removed.
Heirarchial storage requires rigid organization and diligent maintenance.
It probably has outlived it's usefulness.
[reply]
[top]
|
 |