
How can I quickly create millions of tiny files in Linux?Dave, I have searched the web over and can't seem to find an appropriate answer to this question. I have a need for a script that will create millions of 1-5Kbyte files in a linux filesystem for testing purposes. I have to do this on a regular basis and it would be really cool to have it scripted. I am not sure what the quickest way to do this would be? A bit weird as questions go, but I understand exactly what you're talking about and the good news is that you can accomplish this task quickly and easily without having to have a "for i=1...1000000 do; touch x; done". The trick is to use a command called split which, as the man page explains, "The split utility reads the given file (or standard input if no file is specified) and breaks it up into files of 1000 lines each." How would you use this, then? Create a big file and use split with its useful "-b size" flag to create the millions of files you need. Let's step through the process... Let's assume that you want to have, literally, a million 2.5K files. Now let's assume that you want to run the command one hundred times to produce this, which means that the source file needs to be 1/100th of 2.5 million kilobytes in size, or 250 megabytes. (yeah, I know, a K is 1024 bytes, not 1000 bytes) Go look at your log files: you might well have a file sitting in /var/log or a similar location that's already that size. I have a /var/log/messages file on my system that clearly needs to be trimmed - it's 381MB in size. Perfect! You can find your own really big files by using find. Here's how I scrounged my server to find something really big: $ sudo find . -type f -size +10000 -print Typically, find works in terms of 512-byte blocks on the size specifier, so this search isn't looking for files that are greater than 100MB, but greater than 50MB. No worries, though, it will invariably find a dozen or more files and one of 'em will be perfect for our needs. If you can't find one that's quite the right size, create the size you need: $ ls -lh ./private/var/vm/swapfile0 -rw------T 1 root wheel 64M 10 Feb 08:21 ./private/var/vm/swapfile0 $ cat "./private/var/vm/swapfile0" "./private/var/vm/swapfile0" > BigFile $ ls -l /tmp/BigFile -rw-r--r-- 1 taylor wheel 134217728 10 Feb 09:57 /tmp/BigFileNow we have a good size file, one that'll cover your needs. Time to move into a subdirectory and split out oodles of subfiles: $ ls | head xaa xhe xoi xvm ycq yju yqy yyc zfg zmk xab xhf xoj xvn ycr yjv yqz yyd zfh zml xac xhg xok xvo ycs yjw yra yye zfi zmm xad xhh xol xvp yct yjx yrb yyf zfj zmn xae xhi xom xvq ycu yjy yrc yyg zfk zmo xaf xhj xon xvr ycv yjz yrd yyh zfl zmp xag xhk xoo xvs ycw yka yre yyi zfm zmq xah xhl xop xvt ycx ykb yrf yyj zfn zmr xai xhm xoq xvu ycy ykc yrg yyk zfo zms xaj xhn xor xvv ycz ykd yrh yyl zfp zmtVoila! I think you can run with it from here. Good luck.
Help others find this article at Del.icio.us, Digg, Netscape, Reddit, and Stumble Upon
Categorized:
Unix and Linux Help
(Article 3887)
Tagged: Previous: How come I get lots of spam that isn't addressed to me? Next: How come a URL with "mp3" in it kills Movable Type? Subscribe!
Never miss another useful Q&A article again! Subscribe to AskDaveTaylor with Google Reader. but will you have enough inodes? Posted by: wally at February 10, 2005 10:35 PMHi, With reference to the article about creating many tiny files, if you don't want to search for big files, you can create your own big files using dd, like so: dd if=/dev/zero of=myfile bs=1024 count=1024 This will create a file called "myfile", of size 1MB. You can play around with the block size (bs) and block count (count) options to get the size you need. Then, you can use "split". You can also use "if=/dev/random" to get pseudo-random content in the file. Arvind Posted by: Arvind at February 11, 2005 3:16 AMWally, superb point. One way to see if you have enough inodes is to use the "df" command, of course. If I run "df" on my Mac OS X system, for example, I find out that my main drive has 4,232,612 inodes available and 8,624,197 inodes available, meaning that I'm using 33% of the inodes available. If I want to make a million files then I want at least that many inodes, but on this drive, I have 8.6 million. Plenty. Posted by: Dave Taylor at February 11, 2005 4:11 AMThanks for the tip, Arvind. Somehow I always forget about "dd". And I've been using Unix for 25 years now. :-) Posted by: Dave Taylor at February 11, 2005 4:13 AMThanks for the help! I know my question is somewhat strange but it's for a strange project. I have a customer with an expensive SAN system that has literally millions of files. A great number of these files are old and he wants to move the older files to a less expensive storage archive. I have written the script to find the files, build that list of files into a plain text file, then use the list to move and delete the files that were successfully moved. In order to test this, I needed a filesystem with literally millions of files. Your answer was exactly what I was looking for! Thanks again! created a 10M file with DD split -b 5 -a 10 myfile Posted by: Dave Dales at February 11, 2005 7:33 AMWhat's with the addition problem? In my humble opinion, a perl script would be the best way to create bunchs of tiny files on Linux. It's been years since I wrote any perl, but I'll just bet some perl guru could show perl to be the one really "right" solution. Posted by: George Rogers Clark at February 11, 2005 1:04 PMSince split can get its input from standard input, why not pipe a script into it. As an example, since I use gawk, I might do this: gawk 'BEGIN{for(i=0;i<100000;i++)print i;exit}' | split --lines=10 would create 10000 10-line files. Posted by: martin cohen at February 15, 2005 12:17 AMMartin, you demonstrate what's so wonderful about Unix: there are always a dozen ways to solve any given problem. An ingenious solution, thanks. Posted by: Dave Taylor at February 15, 2005 1:32 AMDear Dave: I read your answer about how to create a million files... I was thinking..(maybe I didnīt understand the question)... first of all.. whats wrong with the first answer you propose? ... second.. whiy not a tiny script that start another... tiny scripts.. (recursively.. I say).. so you can have several shells doing the job.. in the end... interesting question.. just for the fun of it...(I remember when the people working in Basic,.. put this problems and try to resolve them in very few lines...) thanks for your attention.. and your effort in resolve questions.. (and, please, excuse my bad english.. as you can see this is not my maternal language..!!!) good regards, Hi Carlos. Your idea of a shell script that recursively calls another shell script could work, but you'd have a lot of processes running, enough that it might well crash your system before you get the million files you seek. Posted by: Dave Taylor at March 7, 2005 2:40 AMThis may be very time-consuming. After you've Just a clarification, Dave. On Linux, you have to use the -i, or df -i, switch to show inode information. I can't find the equivalent on Solaris 8, 9 or 10. Posted by: Jim at May 8, 2007 12:13 PMyes | head -c 10m | split -b 1 Will give you 10 million 1 byte files. Posted by: jason at June 26, 2007 11:04 AMon Solaris An easier way to do this is to use the dd command with a for loop to create many files at the command line. Example below creates a million files each 1 byte each. for((X=0;X<1000000;X+=1)); do dd if=/dev/zero of=file$X.log bs=1 count=1; done http://www.fusebit.com/view/post:570 I have something to say, now that you mention it, but ...
I do have a comment, now that you mention it!
|
![]()
Search
Find just the answers you seek from among our 2300+ free tech support articles by using our Lijit search engine.
Help!
Subscribe to
Ask Dave Taylor!
Free Updates!
Sign up and get free weekly updates and special offers on books, seminars, workshops and more.
Articles and Reviews
Auctions and Online Shopping Blogs and RSS Feeds Building Web site traffic Business and Management Cell Phones and Mobile Phones CGI Scripts and Web Site Programming Computer and Internet Basics d) None of the Above HTML and CSS Industry News and Trade Shows Mac OS X Help MySpace, Facebook, Twitter and Social Network Help Pay Per Click (PPC) Search Engine Optimization Shell Script Programming Sony PSP, MP3 Players, Etc. The Writing Business Unix and Linux Help Video Game Tips and Help Windows Help
Recent Entries
Book Links
|