Industry guru Dave Taylor offers tech support on technical and business topics, including iPhone, iPod, Microsoft Windows, Sony PSP, cellphones, online advertising, CSS, Web design, business, Unix, Linux, SEO, Mac OS X, and shell script programming.     


How can I quickly create millions of tiny files in Linux?

Dave, I have searched the web over and can't seem to find an appropriate answer to this question. I have a need for a script that will create millions of 1-5Kbyte files in a linux filesystem for testing purposes. I have to do this on a regular basis and it would be really cool to have it scripted. I am not sure what the quickest way to do this would be?


Dave's Answer:

A bit weird as questions go, but I understand exactly what you're talking about and the good news is that you can accomplish this task quickly and easily without having to have a "for i=1...1000000 do; touch x; done".

The trick is to use a command called split which, as the man page explains, "The split utility reads the given file (or standard input if no file is specified) and breaks it up into files of 1000 lines each."

How would you use this, then? Create a big file and use split with its useful "-b size" flag to create the millions of files you need. Let's step through the process...

Let's assume that you want to have, literally, a million 2.5K files. Now let's assume that you want to run the command one hundred times to produce this, which means that the source file needs to be 1/100th of 2.5 million kilobytes in size, or 250 megabytes. (yeah, I know, a K is 1024 bytes, not 1000 bytes)

Go look at your log files: you might well have a file sitting in /var/log or a similar location that's already that size. I have a /var/log/messages file on my system that clearly needs to be trimmed - it's 381MB in size. Perfect!

You can find your own really big files by using find. Here's how I scrounged my server to find something really big:

$ sudo find . -type f -size +10000 -print

Typically, find works in terms of 512-byte blocks on the size specifier, so this search isn't looking for files that are greater than 100MB, but greater than 50MB. No worries, though, it will invariably find a dozen or more files and one of 'em will be perfect for our needs. If you can't find one that's quite the right size, create the size you need:

$ ls -lh ./private/var/vm/swapfile0
-rw------T  1 root  wheel       64M 10 Feb 08:21 ./private/var/vm/swapfile0
$ cat "./private/var/vm/swapfile0" "./private/var/vm/swapfile0" > BigFile
$ ls -l /tmp/BigFile
-rw-r--r--  1 taylor  wheel  134217728 10 Feb 09:57 /tmp/BigFile
Now we have a good size file, one that'll cover your needs. Time to move into a subdirectory and split out oodles of subfiles:
$ ls | head
xaa     xhe     xoi     xvm     ycq     yju     yqy     yyc     zfg     zmk
xab     xhf     xoj     xvn     ycr     yjv     yqz     yyd     zfh     zml
xac     xhg     xok     xvo     ycs     yjw     yra     yye     zfi     zmm
xad     xhh     xol     xvp     yct     yjx     yrb     yyf     zfj     zmn
xae     xhi     xom     xvq     ycu     yjy     yrc     yyg     zfk     zmo
xaf     xhj     xon     xvr     ycv     yjz     yrd     yyh     zfl     zmp
xag     xhk     xoo     xvs     ycw     yka     yre     yyi     zfm     zmq
xah     xhl     xop     xvt     ycx     ykb     yrf     yyj     zfn     zmr
xai     xhm     xoq     xvu     ycy     ykc     yrg     yyk     zfo     zms
xaj     xhn     xor     xvv     ycz     ykd     yrh     yyl     zfp     zmt
Voila! I think you can run with it from here. Good luck.

More Useful Unix and Linux Help Articles:
✔   Copy and Paste from the Mac OS X Command Line?
I am constantly running commands in Terminal.app on my MacBook and then copying and pasting the results into email messages or documents. Yes,...
✔   Shell script to convert lowercase to title case?
As part of a project I'm working on, I find myself deep in a Linux shell script, needing to have a subroutine that...
✔   Can I script renaming files based on an XML data map?
I have a folder full of files which are named with four digits and a file extension e.g. 0312.file and an XML-file describing...
✔   Test for valid numbers in a Bash shell script?
In a different discussion on this site [see Redirecting input in a shell script] a visitor commented that "I was too busy trying...
✔   Review: iSSH for the iPad/iPhone
If you're running an online business like I am, there are times when you need to connect and log in to the server...

Let's stay in touch!
Sign up for my weekly AskDaveTaylor Newsletter and you'll receive even more tech and gadget help right to your inbox, along with exclusive news and industry updates. It's good stuff. I promise!
    Enter your name: and your email addr:  









Reader Comments To Date: 18

wally said, on February 10, 2005 10:35 PM:

but will you have enough inodes?

Arvind said, on February 11, 2005 3:16 AM:

Hi,

With reference to the article about creating many tiny files, if you don't want to search for big files, you can create your own big files using dd, like so:

dd if=/dev/zero of=myfile bs=1024 count=1024

This will create a file called "myfile", of size 1MB. You can play around with the block size (bs) and block count (count) options to get the size you need. Then, you can use "split". You can also use "if=/dev/random" to get pseudo-random content in the file.

Arvind

Dave Taylor said, on February 11, 2005 4:11 AM:

Wally, superb point. One way to see if you have enough inodes is to use the "df" command, of course. If I run "df" on my Mac OS X system, for example, I find out that my main drive has 4,232,612 inodes available and 8,624,197 inodes available, meaning that I'm using 33% of the inodes available. If I want to make a million files then I want at least that many inodes, but on this drive, I have 8.6 million. Plenty.

Dave Taylor said, on February 11, 2005 4:13 AM:

Thanks for the tip, Arvind. Somehow I always forget about "dd". And I've been using Unix for 25 years now. :-)

Dave Dales said, on February 11, 2005 7:33 AM:

Thanks for the help! I know my question is somewhat strange but it's for a strange project. I have a customer with an expensive SAN system that has literally millions of files. A great number of these files are old and he wants to move the older files to a less expensive storage archive. I have written the script to find the files, build that list of files into a plain text file, then use the list to move and delete the files that were successfully moved. In order to test this, I needed a filesystem with literally millions of files.

Your answer was exactly what I was looking for! Thanks again!

created a 10M file with DD

split -b 5 -a 10 myfile

George Rogers Clark said, on February 11, 2005 1:04 PM:

What's with the addition problem?

In my humble opinion, a perl script would be the best way to create bunchs of tiny files on Linux. It's been years since I wrote any perl, but I'll just bet some perl guru could show perl to be the one really "right" solution.

martin cohen said, on February 15, 2005 12:17 AM:

Since split can get its input from standard input, why not pipe a script into it.

As an example, since I use gawk, I might do this:

gawk 'BEGIN{for(i=0;i<100000;i++)print i;exit}' | split --lines=10

would create 10000 10-line files.

Dave Taylor said, on February 15, 2005 1:32 AM:

Martin, you demonstrate what's so wonderful about Unix: there are always a dozen ways to solve any given problem. An ingenious solution, thanks.

carlos said, on March 7, 2005 1:22 AM:

Dear Dave:

I read your answer about how to create a million files...

I was thinking..(maybe I didn�t understand the question)... first of all.. whats wrong with the first answer you propose? ... second.. whiy not a tiny script that start another... tiny scripts.. (recursively.. I say).. so you can have several shells doing the job..

in the end... interesting question.. just for the fun of it...(I remember when the people working in Basic,.. put this problems and try to resolve them in very few lines...)

thanks for your attention.. and your effort in resolve questions..

(and, please, excuse my bad english.. as you can see this is not my maternal language..!!!)

good regards,
carlos

Dave Taylor said, on March 7, 2005 2:40 AM:

Hi Carlos. Your idea of a shell script that recursively calls another shell script could work, but you'd have a lot of processes running, enough that it might well crash your system before you get the million files you seek.

k nard said, on April 27, 2005 10:34 PM:

This may be very time-consuming. After you've
done it the first time, try doing a *physical*
backup (i.e., a backup which doesn't need to
perform the system "open()" call for each file.
Then do a restore.
1. If the restore runs faster than the original
procedure of creating the files, use the backup
for future needs.
2. Depending on how your tests process the
files, you might be able to save even more time
by restoring only the directory/inodes.

Jim said, on May 8, 2007 12:13 PM:

Just a clarification, Dave.

On Linux, you have to use the -i, or df -i, switch to show inode information.

I can't find the equivalent on Solaris 8, 9 or 10.

jason said, on June 26, 2007 11:04 AM:

yes | head -c 10m | split -b 1

Will give you 10 million 1 byte files.

kjteoh said, on October 5, 2007 9:06 PM:

on Solaris
df -F ufs -o i

David said, on February 1, 2010 11:16 PM:

An easier way to do this is to use the dd command with a for loop to create many files at the command line. Example below creates a million files each 1 byte each.

for((X=0;X<1000000;X+=1)); do dd if=/dev/zero of=file$X.log bs=1 count=1; done

http://www.fusebit.com/view/post:570

Saurabh said, on May 31, 2010 7:03 AM:

Hi Dave,

Is there a way by which we can create huge files very quickly. the dd command takes a lot of time.
The requirement is to flood memory for a stress testing tool that I m tryin' to build.

thnx in advance
cheers

klode said, on May 3, 2011 8:35 PM:

Dave Taylor: Great use of split; I've never thought of using it that way. However no need to find/create an existing big file, just use /dev/zero. To generate 5M 4095-byte files (20,475,000,000 bytes total):
    dd bs=1000000 count=20475 if=/dev/zero | pv -W -s 20475000000 | split -b 4095 -a 7 -d - foo.
Using "pv" (pipe viewer) allows the impatient to monitor progress and ETA without having to camp on the file system with "watch df -i" or similar.

Dave Dales: I realize you still might want to use a script if you're doing a bunch of verification, but the original find/copy work can be done with one pipeline. Assuming you want to copy files older than two years:
    find srcdir -mtime +730 -print0 | cpio --null -pvd dstdir

George Rogers Clark: It depends on how you define "best way". Likely the most important issue here is speed: how to generate millions files as quickly as possible. An interpreted, string-handling script language probably isn't the way to go here, but then again I've not done perl speed tests for this scenario.

Martin cohen: same concern about the speed of an interpreted, string-handling script language.

k nard: Good point, and I'd also test the speed of restoring from a compressed sector-level copy, similar to how Ghost/Acronis/Clonezilla would handle it.

jason: Dude, that's just sick :-). But thanks for teaching me about the "-c" option to "head".

David: instantiating 1M processes in a for loop will add a serious amount of time. Much better to use a single instance (of whatever command) when handling large numbers of files. This is one of the reasons that xargs was created.

Saurabh: if all you want to do is fill memory, don't bother actually creating the file. Instead use dd to read from a drive and discard:
    dd if=/dev/sda of=/dev/null
If the drive is some sort of RAID, reading an existing file may be much faster than writing a new one.
If you really just want to invalidate the in-memory buffers and cached data (and you're running Linux kernel 2.6.16 or later) clear them like this:
    sync; echo 3 > /proc/sys/vm/drop_caches

Hope this helps,
-klode
P.S. to Dave Taylor regarding ownership rights: I think I understand your restriction, but it's a bit broad for my taste. I don't grant you exclusive use, and I hope you'll be courteous and never claim that you originated the information in my post. If you can't abide by these requests, please delete my post. Thanks again for the information on your site.

Dave Taylor said, on May 3, 2011 9:39 PM:

Thanks for your extensive comment, klode. The ownership rights issue is on the recommendation of my attorney and is just to ensure that if I ever wanted to use any material from one of my blog posts or a comment (which I would, of course, always attribute properly) that I wouldn't risk subsequently having someone sue me for a copyright violation. No nefarious purpose at all.

Starbucks coffee cup I do have a lot to say, and questions of my own for that matter, but first I'd like to say thank you, Dave, for all your helpful information by buying you a cup of coffee!

I do have a comment, now that you mention it!











I will never send you any unsolicited email. Ever.






Check This Out Too...

 
Look for Answers
Need Help? Ask Dave Taylor!
Powered By
Linux Journal: Free Issue!


Follow Me on Pinterest

Find Me on Google+
ADT on G+
© 2002 - 2013 by Dave Taylor. All Rights Reserved.

Note: This web site is for the purpose of disseminating information for educational purposes, free of charge, for the benefit of all visitors. We take great care to provide quality information. However, we do not guarantee, and accept no legal liability whatsoever arising from or connected to, the accuracy, reliability, currency or completeness of any material contained on this web site or on any linked site. Further, please note that by submitting a question or comment you're agreeing to my terms of service, which are: you relinquish any subsequent rights of ownership to your material by submitting it on this site. My lawyer says "Thanks".
"Ask Dave Taylor®" is a registered trademark of Intuitive Systems, LLC.