WebApp Sec mailing list archives

Re: Controlling access to pdf/doc files (db "better" than filesystem?)


From: David Cameron <david () uberconcept com>
Date: Tue, 02 Mar 2004 07:42:19 +1100

My original point on the topic was the I fail to see what you gain by
using BLOBS. Returning to that, I'll leave a fuller explanation on SQL Server performance to the end.

By using BLOBs you do gain transactional support. However you must weigh
that against the disadvantages. Retrieving and storing those files is
much more difficult. Secondly it performs badly (at least in SQL
Server, can anyone else comment on Oracle/DB2/MYSQL?). Thirdly the
increase in size of the database due to storing files in the database
can cause performance of the database to decrease (once again SQL
Server). I would be surprised if other databases perform well when
working with BLOBs.

I might add that doing this does not necessarily add further security
(this is after all a security mailing list). If you want to make the
files less accessible, move them to a directory that is not accessible
from the web, and deny permissions for the account under which the web
application runs. When you want to access the files, change to another
account that has permissions to see that directory (create a specialised
account for this purpose) and then access the files. As for
transactional security, you could roll your own.

SQL Server Performance (ignore this if you aren't interested, please
correct me if I go offtrack):
The reason I am saying all this is that BLOB performance in SQL Server
is bad. SQL Server is optimised to insert, update and retrieve data that
is limited to 8k pages (the maximum size of a row is 8k), rather than data that is split over multiple pages. LOBs (text, nText, image) are split over a number of data pages as a BTree, to go to a specific location inside the LOB is a quick operation, but slow to add the LOB. Add to that the fact that SQL Server needs to allocate a new page for each 8k segment. This could force a demand for more space for the database, or may force a reorganisation internally. The impact of the space allocation issue could be limited by a good DBA, who ensured that a sufficient amount of space was free at all times. Even so my suspicion (it is a little hard to prove) is that this would not perform as well as adding files to a filesystem.

Recently I ran some tests against a SQL Server database to test INSERT
performance of adding BLOBs. The performance was 2-4 times worse when
using BLOBs. Performance tends to drop significantly with larger files.

I can't comment on the performance of SELECT/UPDATE statements, as I haven't tested against that. I suspect (once again untested) that it would be less than great, although the performance losses might be less.

I have looked high and low for test results/best practices
recommendations on BLOBs, without success. Most of the recommendations
come through comments from DBAs. The comment on the increase in size
(due to BLOBs) causing generally poor performance in the database came
through a comment from a DBA who had implemented BLOB storage and wished he hadn't after the effect.

There is an occasion to use BLOBs, and that is when either the files are
small or when there are other side benefits that outweigh the losses, for example storing .txt files in nText of text so that LIKE queries can be run against them. The closest I have come to a genuine piece of information coming from Microsoft on this relates to their terraserver project. Under the section titled "Learning from the TerraServer Design and Implementation", there are some comments on how Microsoft implemented their terraserver project. The comments suggest that performance is best when small files are being stored. In this case Microsoft split the image files into 9.5k chunks, ie it would take only two data pages per image.
http://www.microsoft.com/resources/documentation/sql/2000/all/reskit/en-us/part3/c1161.mspx

regards
David Cameron


Jannie Hanekom wrote:
I'm by no means an expert on the topic, but I'd like to point out that
in my Windows/SQL/IIS experience, database-stored files tend to be
significantly slower than their file system counterparts.  Databases
introduce significant overhead;  in addition to the layer of translation
between the database itself and the file system, there is also the layer
of processing that the client database driver introduces.  There's also
the fact that not the entire 8K page is used for data storage - some
overhead is introduced by the database system in use as well.

With regards to caching of files by the database in stead of the file
system, it will introduce quite a bit of overhead, as the process
rendering the file has to traverse the database driver to retrieve the
file;  also, your dba may not appreciate you flushing out a couple of MB
of crucial indexes every time you retrieve a large blob from the
database.  (How do databases deal with this?  Do they cache blobs at
all?  I've not done any testing on that...)

Also keep in mind that in most larger implementations, the database is
stored on a server separate from the actual web server, introducing a
network layer with its inherent latency into the mix.  Add a firewall
between the database and web server, and you can guess where this is
going...

Lastly, this approach wreaks havoc with the performance of solutions
where web-server clustering is used to add scalability.  Granted, this
is less of an issue with the comparatively low volume of traffic that
pdf/doc files are likely to produce, but it's pretty effective at
bringing a site to a crawl if high-frequency files (such as images) are
stored in a database (yes, it's actually done more frequently than you
might guess...)

The transactional integrity argument is a pretty strong one, but I have
reservations about the argument that storing files in the database is
more secure than on the web server itself.  As of late, there have been
far more SQL injection vulnerabilities than web server directory
traversal vulnerabilities, so my money would be on dynamically creating
and deleting the files or placing them in a secure area outside of the
web root.  It's much easier to audit file permissions than it is to find
injection vulnerabilities in a large amount of ASP/PHP code.

As always, each situation is unique and has its own merits.  From a
dba/server admin perspective, I'd prefer it if the dev team stored their
files on the file system and their data in the database, the way it was
intended.

Jannie

-----Original Message-----
From: Ido Rosen [mailto:ido () cs uchicago edu] Sent: 28 February 2004 20:55
To: David Wall @ Yozons, Inc.
Cc: webappsec () securityfocus com
Subject: Re: Controlling access to pdf/doc files (db "better" than
filesystem?)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 28 Feb 2004 11:13:21 -0800
"David Wall @ Yozons, Inc." <dwall () yozons com> wrote:


that  in SQL Server is that all data in SQL Server is split over ~8k


pages. When you add a BLOB it needs to be split into 8k chunks. When


you

But filesystems also store data into pages, often much smaller than 8k


chunk.


I agree that storing files with their metadata for such a solution in a
database is a better solution than storing files.  It's also probably
more secure, since the web developer is less likely to botch some
permissions, security, or sanity checks and since most database systems
already have some sanity checks built in.  Your reasoning in that last
sentence is a bit off, though:  Database systems (such as MySQL, PgSQL,
ThinkSQL, and MSSQL) all must use the filesystem, so their 8k chunks may
not match, and the storage may be out of phase.  This is just a result
of overlaying one file storage paradigm over another, and shouldn't
cause too much trouble speed-wise.  By adding a layer on top of the
filesystem, you do increase the likelihood of inefficiency.




Current thread: