-
Terrors with php memory and fileio
Posted on March 2nd, 2007 No commentsIn my unending quest to eek more speed out of things, I took the PHP manual as gospel and believed it when it said it creates a new value in RAM when using the assignment operator.
Big mistake, it doesn’t.
This essentially renders my RAM disk class useless. I had spent the past three nights designing an incredibly powerful class that accesses raw RAM using the new memory stream introduced in PHP 5.1. There is a lot I learned in the process.
First off, the way the new memory stream accesses RAM is just awful. It seemingly randomly hit ceilings. For example, when I ran tests using the memory stream to read a 5MB file 500 times, I would run out of RAM with a 16MB RAM buffer. How is this possible? Beats me. I ran out with 32MB too. It would not work until I had a 128MB memory buffer. You would think my destroying the stream and creating a new one wasn’t really creating a new one, but no! It was. 5MB * 500 iterations = 2.5GB, and I only have 1GB of machine RAM, so that’s out.
Another test with writing large data to memory actually caused Ubuntu Linux to lock and crash. If anyone wants to see that one, I’d be more than glad to send you the code. I backed it up.
The results of some tests are here, and it’s just scary. When repeatedly doing a lot of reads, file_get_contents performs terribly. At 500 reads, it was 689 times slower than using a fread() loop that is synced to the disk cluster size. I ran tests at several common cluster sizes on my 0×2000 cluster ext3 partition, and these were the results:
file_get_contents() and strings: 4.0085108
file_get_contents() and memory: 7.8197538
0x200 byte block loop and strings: 0.0198390
0x200 byte block loop and memory: 0.0272037
0x1000 byte block loop and strings: 0.0120730
0x1000 byte block loop and memory: 0.0121850
0x2000 byte block loop and strings: 0.0113458
0x2000 byte block loop and memory: 0.0159289
file_get_contents() and strings: 3.9356260
file_get_contents() and memory: 7.5591630Strings refers to a string type variable, memory refers to raw writes to the memory stream. This clocks only reading.
As you can see, file_get_contents() performs terribly. So terribly I would advise not using it … were it not for the next test. This time, I performed the same tests, but using a single pass of each file. Watch the results:
file_get_contents() and strings: 0.0073969
file_get_contents() and memory: 0.0150711
0x200 byte block loop and strings: 0.0202429
0x200 byte block loop and memory: 0.0235929
0x1000 byte block loop and strings: 0.0118038
0x1000 byte block loop and memory: 0.0122580
0x2000 byte block loop and strings: 0.0113141
0x2000 byte block loop and memory: 0.0114190
file_get_contents() and strings: 0.0070519
file_get_contents() and memory: 0.0148279Wh-wh-what!? Yes, it appears it’s now file_get_contents() that runs like greased lightning. What happened? My hypothesis was it must be the amount of data read, so I tried again using an 80MB files instead of a 5MB file:
file_get_contents() and strings: 0.1446838
file_get_contents() and memory: 0.5497040
0x200 byte block loop and strings: 0.3874199
0x200 byte block loop and memory: 0.4414949
0x1000 byte block loop and strings: 0.2321949
0x1000 byte block loop and memory: 0.2335169
0x2000 byte block loop and strings: 0.2134249
0x2000 byte block loop and memory: 0.2217171
file_get_contents() and strings: 0.1421670
file_get_contents() and memory: 0.5488090I am completely at a loss here. First off, PHP’s direct pipeline to ram underperforms the assignment operator every single time. This totally conflicts with what the PHP manual suggests. Secondly, the file_get_contents() function seems to be the absolute fastest way to read files, provided two things: you are reading to a variable and you are only reading the file once. When multiple reads enter the picture … well, refer to the first example.
This is somewhat terrifying.
First off, the RAM disk class I spent three days writing is now apparently useless. It’s only valid use would be if you needed a way to write to the memory stream without risking exceeding its bounds. But when raw memory is running behind PHP’s strings, why would you want to?
Nice to know those sleepless nights were well spent. Oh well, that’s what science is all about, killing hypotheses.
However, I am now motivated to write a stringfs style class which would manage files within a string using the same method as before. The point of this is you could then zoom through the string using the far friendlier fread() and fwrite() methods, rather than endless substr() and condensing black magic.
Thoughts?
Leave a reply









