- Sebastian's Blog
- Posts
- From Data Extraction to API
From Data Extraction to API
Building a NodeJS API for extracted data management

In this post, I'll walk you through the process of building a Node.js API for managing job data scraped in this post. The API includes routes for jobs, endpoints to render a pie chart, and a bar chart using Nivo charts. I 'll explain how I set up the server, handle requests, interact with the database, and implement basic testing.
1. Initializing the Project
First, we initialized the Node.js project and installed the necessary dependencies. Express for the web server, body-parser for parsing request bodies, dotenv for environment variables, and cors for handling cross-origin requests.
npm init
npm install express body-parser dotenv cors mysql2
2. Project Structure
The project is structured as follows, :
app.js
: Main server setup, I don’t have many functions here so it is very simple.routes/
: Directory containing routesjobs.js
: Handles job-related endpointspie-chart.js
: Provides data for pie chartssalary.js
: Retrieves salary informationstats.js
: Provides statistics like salary range and frequent locations
database.js
: Database connection setup
3. Database Connection
The database.js file sets up a connection pool to the MySQL database using the mysql2/promise library. This setup allows for efficient handling of multiple database requests.
// database.js
const mysql = require('mysql2/promise');
const pool = mysql.createPool({
host: 'localhost',
user: 'root',
password: 'mypassword',
database: 'mydatabase'
});
module.exports = pool;
4. Writing Tests
I followed the TDD methodology to build this API, first writing the tests and then writing the code.
I use the supertest library to write tests for the API. The tests check the functionality of the endpoints, ensuring they return the expected data and status codes.
const request = require('supertest');
let app = require('./app');
beforeAll(() => {
server = app.listen(3001, () => {
console.log('Server is running on port 3001');
});
});
afterAll(done => {
server.close(done);
});
describe('SeekCrawler API', () => {
it('GET jobs', () => {
return request(app)
.get('/jobs')
.expect('Content-Type', /json/)
.expect(200)
.then(response => {
expect(response.body).toEqual(
expect.arrayContaining([
expect.objectContaining({
titulo: expect.any(String)
})
])
);
});
});
describe('SeekCrawler API', () => {
it('Should return job locations with counts greater than 5', () => {
return request(app)
.get('/')
.expect('Content-Type', /json/)
.expect(200)
.then(response => {
expect(response.body).toEqual(
expect.arrayContaining([
expect.objectContaining({
id: expect.any(String),
label: expect.any(String),
value: expect.any(Number)
})
])
);
});
});
/* Rest of the tests... */
5. Setting Up Express
In app.js, I set up the Express server. I use dotenv to manage environment variables, and cors to handle cross-origin requests.
// app.js
const express = require('express');
const app = express();
const db = require('./database');
require('dotenv').config();
const cors = require('cors');
// Cross-origin requests form localhost 3001 port
app.use(cors({ origin: '<http://localhost:3001>' }));
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// Routes
const jobsRouter = require('./routes/jobs');
const pieChartRouter = require('./routes/pie-chart');
const salaryRouter = require('./routes/salary');
const statsRouter = require('./routes/stats');
// Urls
app.use('/jobs', jobsRouter);
app.use('/pie-chart', pieChartRouter);
app.use('/salaries', salaryRouter);
app.use('/stats', statsRouter);
app.listen(3000, () => {
console.log('Server is running');
});
module.exports = app;
Here, I define routes for jobs, stats, pie-chart, and salaries. Each route is handled by its respective router module.
6. Jobs Route
In routes/jobs.js, I define the /jobs endpoint, which fetches job data from the database. I take all the data from jobs table.
// Jobs.js
const { Router } = require('express');
const db = require('../database');
const router = Router();
router.get('/', (req, res) => {
db.query('SELECT * FROM jobs', (err, results) => {
if (err) {
console.error('Database query error:', err);
res.status(500).send('Internal Server Error');
} else {
res.json(results);
}
});
});
module.exports = router;
7. Pie Chart Route
The pie-chart.js file defines a route that provides the data for generating a Nivo pie chart based on job locations. Im avoiding locations with less than 5 jobs. Just to have a nicer chart.
const { Router } = require("express");
const db = require("../database");
const router = Router();
router.get("/", async (req, res) => {
try {
const [results] = await db.query(
"SELECT ubicacion as id, ubicacion as label, COUNT(*) as value FROM jobs GROUP BY ubicacion HAVING COUNT(*) > 5"
);
res.json(results);
} catch (error) {
console.error("Database query error:", error);
res.status(500).send(`Internal Server Error: ${error.message}`);
}
});
module.exports = router;
8. Salary Route
In routes/salary.js, Here is an endpoint to retrieve job titles and salaries to implement on the Nivo bar chart.
// salary.js
const { Router } = require("express");
const db = require("../database");
const router = Router();
router.get("/", async (req, res) => {
try {
const [results] = await db.query(
"SELECT titulo as role, salario as salary FROM jobs WHERE salario IS NOT null AND salario > 1000 LIMIT 200"
);
res.json(results);
} catch (error) {
console.log(error);
res.status(500).send(`Internal Server Error: ${error.message}`);
}
});
module.exports = router;
This route filters jobs with salaries greater than 1000 (To retrieve annual salaries above that number due to some posts having hourly payment) and returns them, along with their titles.
9. Stats Route
The stats.js file defines a route that provides statistical information about the job data. It includes queries for the minimum and maximum salary and the most frequent job location.
const { Router } = require("express");
const db = require("../database");
const router = Router();
router.get("/", async (req, res) => {
try {
const salaryQuery = `
SELECT
MIN(salario) AS min_salary,
MAX(salario) AS max_salary
FROM jobs
WHERE salario IS NOT NULL AND salario > 9999;
`;
const locationQuery = `
SELECT
ubicacion AS most_frequent_location,
COUNT(*) AS max_count
FROM jobs
GROUP BY ubicacion
ORDER BY max_count DESC
LIMIT 1;
`;
const [salaryResults] = await db.query(salaryQuery);
const [locationResults] = await db.query(locationQuery);
const response = {
min_salary: salaryResults[0].min_salary,
max_salary: salaryResults[0].max_salary,
most_frequent_location: locationResults[0].most_frequent_location,
max_count: locationResults[0].max_count,
};
res.json(response);
} catch (err) {
console.error("Database query error:", err);
res.status(500).send(`Internal Server Error: ${err.message}`);
}
});
module.exports = router;
This route performs two queries: one to get the minimum and maximum salaries, and another to find the most frequent job location.
In this post, I'll walk you through the process of building a Node.js API for managing job data scraped in this post. The API includes routes for jobs, endpoints to render a pie chart, and a bar chart using Nivo charts. I 'll explain how I set up the server, handle requests, interact with the database, and implement basic testing.
1. Initializing the Project
First, we initialized the Node.js project and installed the necessary dependencies. Express for the web server, body-parser for parsing request bodies, dotenv for environment variables, and cors for handling cross-origin requests.
npm init
npm install express body-parser dotenv cors mysql2
2. Project Structure
The project is structured as follows, :
app.js
: Main server setup, I don’t have many functions here so it is very simple.routes/
: Directory containing routesjobs.js
: Handles job-related endpointspie-chart.js
: Provides data for pie chartssalary.js
: Retrieves salary informationstats.js
: Provides statistics like salary range and frequent locations
database.js
: Database connection setup
3. Database Connection
The database.js file sets up a connection pool to the MySQL database using the mysql2/promise library. This setup allows for efficient handling of multiple database requests.
// database.js
const mysql = require('mysql2/promise');
const pool = mysql.createPool({
host: 'localhost',
user: 'root',
password: 'mypassword',
database: 'mydatabase'
});
module.exports = pool;
4. Writing Tests
I followed the TDD methodology to build this API, first writing the tests and then writing the code.
I use the supertest library to write tests for the API. The tests check the functionality of the endpoints, ensuring they return the expected data and status codes.
const request = require('supertest');
let app = require('./app');
beforeAll(() => {
server = app.listen(3001, () => {
console.log('Server is running on port 3001');
});
});
afterAll(done => {
server.close(done);
});
describe('SeekCrawler API', () => {
it('GET jobs', () => {
return request(app)
.get('/jobs')
.expect('Content-Type', /json/)
.expect(200)
.then(response => {
expect(response.body).toEqual(
expect.arrayContaining([
expect.objectContaining({
titulo: expect.any(String)
})
])
);
});
});
describe('SeekCrawler API', () => {
it('Should return job locations with counts greater than 5', () => {
return request(app)
.get('/')
.expect('Content-Type', /json/)
.expect(200)
.then(response => {
expect(response.body).toEqual(
expect.arrayContaining([
expect.objectContaining({
id: expect.any(String),
label: expect.any(String),
value: expect.any(Number)
})
])
);
});
});
/* Rest of the tests... */
5. Setting Up Express
In app.js, I set up the Express server. I use dotenv to manage environment variables, and cors to handle cross-origin requests.
// app.js
const express = require('express');
const app = express();
const db = require('./database');
require('dotenv').config();
const cors = require('cors');
// Cross-origin requests form localhost 3001 port
app.use(cors({ origin: '<http://localhost:3001>' }));
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// Routes
const jobsRouter = require('./routes/jobs');
const pieChartRouter = require('./routes/pie-chart');
const salaryRouter = require('./routes/salary');
const statsRouter = require('./routes/stats');
// Urls
app.use('/jobs', jobsRouter);
app.use('/pie-chart', pieChartRouter);
app.use('/salaries', salaryRouter);
app.use('/stats', statsRouter);
app.listen(3000, () => {
console.log('Server is running');
});
module.exports = app;
Here, I define routes for jobs, stats, pie-chart, and salaries. Each route is handled by its respective router module.
6. Jobs Route
In routes/jobs.js, I define the /jobs endpoint, which fetches job data from the database. I take all the data from jobs table.
// Jobs.js
const { Router } = require('express');
const db = require('../database');
const router = Router();
router.get('/', (req, res) => {
db.query('SELECT * FROM jobs', (err, results) => {
if (err) {
console.error('Database query error:', err);
res.status(500).send('Internal Server Error');
} else {
res.json(results);
}
});
});
module.exports = router;
7. Pie Chart Route
The pie-chart.js file defines a route that provides the data for generating a Nivo pie chart based on job locations. Im avoiding locations with less than 5 jobs. Just to have a nicer chart.
const { Router } = require("express");
const db = require("../database");
const router = Router();
router.get("/", async (req, res) => {
try {
const [results] = await db.query(
"SELECT ubicacion as id, ubicacion as label, COUNT(*) as value FROM jobs GROUP BY ubicacion HAVING COUNT(*) > 5"
);
res.json(results);
} catch (error) {
console.error("Database query error:", error);
res.status(500).send(`Internal Server Error: ${error.message}`);
}
});
module.exports = router;
8. Salary Route
In routes/salary.js, Here is an endpoint to retrieve job titles and salaries to implement on the Nivo bar chart.
// salary.js
const { Router } = require("express");
const db = require("../database");
const router = Router();
router.get("/", async (req, res) => {
try {
const [results] = await db.query(
"SELECT titulo as role, salario as salary FROM jobs WHERE salario IS NOT null AND salario > 1000 LIMIT 200"
);
res.json(results);
} catch (error) {
console.log(error);
res.status(500).send(`Internal Server Error: ${error.message}`);
}
});
module.exports = router;
This route filters jobs with salaries greater than 1000 (To retrieve annual salaries above that number due to some posts having hourly payment) and returns them, along with their titles.
Conclusion:
This Node.js API project demonstrates how to set up a server using Express, connect to the MySQL database, and handle different types of requests for the Dashboard part. The code also includes testing to ensure the API works as expected.
In the next part of the post I will show how I consume this in a Dashboard built in NextJS, using client and server components.
Stay tuned!