Building an Image Gallery Blog with Symfony Flex: Data Testing
In the previous article, we demonstrated how to set up a Symfony project from scratch with Flex, and how to create a simple set of fixtures and get the project up and running.
The next step on our journey is to populate the database with a somewhat realistic amount of data to test application performance.
Note: if you did the “Getting started with the app” step in the previous post, you've already followed the steps outlined in this post. If that's the case, use this post as an explainer on how it was done.
As a bonus, we'll demonstrate how to set up a simple PHPUnit test suite with basic smoke tests.
More Fake Data
Once your entities are polished, and you've had your "That's it! I'm done!" moment, it's a perfect time to create a more significant dataset that can be used for further testing and preparing the app for production.
Simple fixtures like the ones we created in the previous article are great for the development phase, where loading ~30 entities is done quickly, and it can often be repeated while changing the DB schema.
Testing app performance, simulating real-world traffic and detecting bottlenecks requires bigger datasets (i.e. a larger amount of database entries and image files for this project). Generating thousands of entries takes some time (and computer resources), so we want to do it only once.
We could try increasing the COUNT
constant in our fixture classes and seeing what will happen:
// src/DataFixtures/ORM/LoadUsersData.php
class LoadUsersData extends AbstractFixture implements ContainerAwareInterface, OrderedFixtureInterface
{
const COUNT = 500;
...
}
// src/DataFixtures/ORM/LoadGalleriesData.php
class LoadGalleriesData extends AbstractFixture implements ContainerAwareInterface, OrderedFixtureInterface
{
const COUNT = 1000;
...
}
Now, if we run bin/refreshDb.sh, after some time we'll probably get a not-so-nice message like PHP Fatal error: Allowed memory size of N bytes exhausted
.
Apart from slow execution, every error would result in an empty database because EntityManager is flushed only at the very end of the fixture class. Additionally, Faker is downloading a random image for every gallery entry. For 1,000 galleries with 5 to 10 images per gallery that would be 5,000 - 10,000 downloads, which is really slow.
There are excellent resources on optimizing Doctrine and Symfony for batch processing, and we're going to use some of these tips to optimize fixtures loading.
First, we'll define a batch size of 100 galleries. After every batch, we'll flush and clear the EntityManager
(i.e., detach persisted entities) and tell the garbage collector to do its job.
To track progress, let's print out some meta information (batch identifier and memory usage).
Note: After calling $manager->clear()
, all persisted entities are now unmanaged. The entity manager doesn't know about them anymore, and you'll probably get an "entity-not-persisted" error.
The key is to merge the entity back to the manager $entity = $manager->merge($entity);
Without the optimization, memory usage is increasing while running a LoadGalleriesData
fixture class:
> loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 24MB
200 Memory usage (currently) 26MB / (max) 26MB
300 Memory usage (currently) 28MB / (max) 28MB
400 Memory usage (currently) 30MB / (max) 30MB
500 Memory usage (currently) 32MB / (max) 32MB
600 Memory usage (currently) 34MB / (max) 34MB
700 Memory usage (currently) 36MB / (max) 36MB
800 Memory usage (currently) 38MB / (max) 38MB
900 Memory usage (currently) 40MB / (max) 40MB
1000 Memory usage (currently) 42MB / (max) 42MB
Memory usage starts at 24 MB and increases for 2 MB for every batch (100 galleries). If we tried to load 100,000 galleries, we'd need 24 MB + 999 (999 batches of 100 galleries, 99,900 galleries) * 2 MB = ~2 GB of memory.
After adding $manager->flush()
and gc_collect_cycles()
for every batch, removing SQL logging with $manager->getConnection()->getConfiguration()->setSQLLogger(null)
and removing entity references by commenting out $this->addReference('gallery' . $i, $gallery);
, memory usage becomes somewhat constant for every batch.
// Define batch size outside of the for loop
$batchSize = 100;
...
for ($i = 1; $i <= self::COUNT; $i++) {
...
// Save the batch at the end of the for loop
if (($i % $batchSize) == 0 || $i == self::COUNT) {
$currentMemoryUsage = round(memory_get_usage(true) / 1024);
$maxMemoryUsage = round(memory_get_peak_usage(true) / 1024);
echo sprintf("%s Memory usage (currently) %dKB/ (max) %dKB \n", $i, $currentMemoryUsage, $maxMemoryUsage);
$manager->flush();
$manager->clear();
// here you should merge entities you're re-using with the $manager
// because they aren't managed anymore after calling $manager->clear();
// e.g. if you've already loaded category or tag entities
// $category = $manager->merge($category);
gc_collect_cycles();
}
}
As expected, memory usage is now stable:
> loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 24MB
200 Memory usage (currently) 26MB / (max) 28MB
300 Memory usage (currently) 26MB / (max) 28MB
400 Memory usage (currently) 26MB / (max) 28MB
500 Memory usage (currently) 26MB / (max) 28MB
600 Memory usage (currently) 26MB / (max) 28MB
700 Memory usage (currently) 26MB / (max) 28MB
800 Memory usage (currently) 26MB / (max) 28MB
900 Memory usage (currently) 26MB / (max) 28MB
1000 Memory usage (currently) 26MB / (max) 28MB
Instead of downloading random images every time, we can prepare 15 random images and update the fixture script to randomly choose one of them instead of using Faker's $faker->image()
method.
Let's take 15 images from Unsplash and save them in var/demo-data/sample-images
.
Then, update the LoadGalleriesData::generateRandomImage
method:
private function generateRandomImage($imageName)
{
$images = [
'image1.jpeg',
'image10.jpeg',
'image11.jpeg',
'image12.jpg',
'image13.jpeg',
'image14.jpeg',
'image15.jpeg',
'image2.jpeg',
'image3.jpeg',
'image4.jpeg',
'image5.jpeg',
'image6.jpeg',
'image7.jpeg',
'image8.jpeg',
'image9.jpeg',
];
$sourceDirectory = $this->container->getParameter('kernel.project_dir') . '/var/demo-data/sample-images/';
$targetDirectory = $this->container->getParameter('kernel.project_dir') . '/var/uploads/';
$randomImage = $images[rand(0, count($images) - 1)];
$randomImageSourceFilePath = $sourceDirectory . $randomImage;
$randomImageExtension = explode('.', $randomImage)[1];
$targetImageFilename = sha1(microtime() . rand()) . '.' . $randomImageExtension;
copy($randomImageSourceFilePath, $targetDirectory . $targetImageFilename);
$image = new Image(
Uuid::getFactory()->uuid4(),
$randomImage,
$targetImageFilename
);
return $image;
}
It's a good idea to remove old files in var/uploads
when reloading fixtures, so I'm adding rm var/uploads/*
command to bin/refreshDb.sh
script, immediately after dropping the DB schema.
Loading 500 users and 1000 galleries now takes ~7 minutes and ~28 MB of memory (peak usage).
Dropping database schema...
Database schema dropped successfully!
ATTENTION: This operation should not be executed in a production environment.
Creating database schema...
Database schema created successfully!
> purging database
> loading [100] App\DataFixtures\ORM\LoadUsersData
300 Memory usage (currently) 10MB / (max) 10MB
500 Memory usage (currently) 12MB / (max) 12MB
> loading [200] App\DataFixtures\ORM\LoadGalleriesData
100 Memory usage (currently) 24MB / (max) 26MB
200 Memory usage (currently) 26MB / (max) 28MB
300 Memory usage (currently) 26MB / (max) 28MB
400 Memory usage (currently) 26MB / (max) 28MB
500 Memory usage (currently) 26MB / (max) 28MB
600 Memory usage (currently) 26MB / (max) 28MB
700 Memory usage (currently) 26MB / (max) 28MB
800 Memory usage (currently) 26MB / (max) 28MB
900 Memory usage (currently) 26MB / (max) 28MB
1000 Memory usage (currently) 26MB / (max) 28MB
Take a look at the fixture classes source: LoadUsersData.php and LoadGalleriesData.php.
The post Building an Image Gallery Blog with Symfony Flex: Data Testing appeared first on SitePoint.