WebApp Sec mailing list archives
Re: Controlling access to pdf/doc files (db "better" than filesystem?)
From: Ido Rosen <ido () cs uchicago edu>
Date: Sat, 28 Feb 2004 14:54:57 -0600
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 28 Feb 2004 11:13:21 -0800 "David Wall @ Yozons, Inc." <dwall () yozons com> wrote:
that in SQL Server is that all data in SQL Server is split over ~8k pages. When you add a BLOB it needs to be split into 8k chunks. When youBut filesystems also store data into pages, often much smaller than 8k chunk.
I agree that storing files with their metadata for such a solution in a database is a better solution than storing files. It's also probably more secure, since the web developer is less likely to botch some permissions, security, or sanity checks and since most database systems already have some sanity checks built in. Your reasoning in that last sentence is a bit off, though: Database systems (such as MySQL, PgSQL, ThinkSQL, and MSSQL) all must use the filesystem, so their 8k chunks may not match, and the storage may be out of phase. This is just a result of overlaying one file storage paradigm over another, and shouldn't cause too much trouble speed-wise. By adding a layer on top of the filesystem, you do increase the likelihood of inefficiency. That said, there's a counterargument: Databases, or at least smart ones, are built to cache data efficiently into memory. If your database server has enough memory, it may even become faster than serving the file off of the filesystem directly. The reasoning for this is that the filesystem cache (if there is any at all) also includes shared libraries and other files which are currently executing, given priority over any sort of data caching. This cache is also limited in space, in most implementations, so as not to take too much precious RAM. Databases, however, are generally built with the assumption that if you are using a database server for anything that could use significant caching, or for major resource-intensive tasks (like serving hundreds of thousands of users), then the database server will be the prime service of the machine, and therefore may take up significant amounts of resources (specifically, cache more stuff into memory). So, in some situations I'd ima gine database file storage would in fact be _faster_ for retrieval than filesystem storage. This is based on too many assumptions regarding the database server's design and the operating system underlying the database server, and the server machine being used, and so I don't give it much credit. Then again, I may be wrong...
Our Signed & Secured application stores all files as BLOBs in a database for all of transactional and backup capabilities, but we've never run tests of 100+ concurrent web users downloading files to see if the database or the filesystem would be faster. In general, faster was less important to us being able to support lots of concurrent requests because the speed of retrieval from the db was always assumed to be faster than it could be streamed back across typically slower Internet links. After all, the data has to be sent back to a user's web browser, so the speed of the transfer is limited by the slowest link between the browser and the web server.
This is the right attitude. Speed where it is useful, administrative efficiency whenever possible. Ido
David
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAQQAhmhQsAkXAJP0RAtsIAJ0YEU2nqXhbrrEEbjuJ6ENNPnBuGwCgo1gS z2SccYIaCJwsvmk2bnpgZmw= =0tLv -----END PGP SIGNATURE-----
Current thread:
- Re: Controlling access to pdf/doc files, (continued)
- Re: Controlling access to pdf/doc files lists AT dawes DOT za DOT net (Feb 26)
- RE: Controlling access to pdf/doc files Paulus Widodo (Feb 26)
- Re: Controlling access to pdf/doc files Jed Holler (Feb 25)
- RE: Controlling access to pdf/doc files Scovetta, Michael V (Feb 25)
- RE: Controlling access to pdf/doc files GRIFFITHS ian (Feb 25)
- RE: Controlling access to pdf/doc files Alistair Meikle (Feb 26)
- Re: Controlling access to pdf/doc files Mark Curphey (Feb 26)
- RE: Controlling access to pdf/doc files Sangita Pakala (Feb 28)
- Re: Controlling access to pdf/doc files David Cameron (Feb 28)
- Re: Controlling access to pdf/doc files (db "better" than filesystem?) David Wall @ Yozons, Inc. (Feb 28)
- Re: Controlling access to pdf/doc files (db "better" than filesystem?) Ido Rosen (Feb 28)
- RE: Controlling access to pdf/doc files Sangita Pakala (Feb 28)